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INTRODUCTION 
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Almost  every  statistician  has  used  simple  linear  regression 
many  times;  it  probably  is  the  most  well-used  statistical  procedure. 
If  there  is  more  than  one  dependent  variable  present,  we  enter  into 
the  realm  of  multivariate  regression.  In  both  univariate  and 
multivariate  regression,  we  can  estimate  regression  coefficients, 
find  confidence  intervals  for  the  regression  coefficients,  and  test 
whether  the  regression  coefficients  are  equal  to  a known  matrix. 
However  another  kind  of  problem  exists  in  multivariate  regression, 
but  does  not  exist  in  univariate  regression.  In  multivariate 
regression,  the  regression  coefficient  matrix  may  not  be  of  full 
row  rank,  i.e.,  there  may  exist  unknown  linear  restrictions  on  the 
regression  coefficient  matrix.  Vie  may  want  to  estimate  the 
regression  coefficient  matrix  and  the  unknown  linear  restrictions 
under  the  hypothesis  that  the  linear  restrictions  do  exist.  For 
instance,  when  we  estimate  one  linear  restriction,  we  usually  are 
trying  to  find  the  linear  combination  of  the  elements  of  each 
column  of  the  regression  coefficient  matrix  which  equal  some 
unknown  quantity. 

Vie  now  define  precisely  the  model  and  hypothesis  to  which  we 


have  been  referring: 
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xi  = Hfi+ei’  1 = 1 »2> • • • >N, 

Bh  = aa, 

where  is  a p-dimensional  vector  of  observations,  h is  the  unknown 
pxk  (k  >_  p)  regression  coefficient  matrix,  f.  is  a k-dimensional 
vector  of  dependent  variables,  e-  is  a p-dimensional  error  vector, 

B is  a rxp(r<p)  matrix  of  linear  restrictions,  a is  an  unknown  rxs(s^r) 
matrix  which  provides  a basis  for  the  space  spanned  by  the  columns 
of  B=,  and  a is  a known  sxk  matrix.  The  matrix  form  of  the  above 
equations  is 

(0.0.1)  X = =F+E, 

(0.0.2)  B==«a, 

where 

X — (x-|  ,X£,. . . ,XjSI ) , 

F - (f-|  ,^2'‘  • • > ^ ’ 

E - (e.|  .eg,. . . »eN ) . 

T.  W.  Anderson  [1951a]  found  the  maximum  likelihood  estimators 
(MLE's)  of  the  parameters  B,  =,  and  I when  a is  the  zero  matrix. 

Later,  Villegas  [1961]  found  the  MLE's  of  B,  e,  z,  and  a in  the 
above  model  when  F is  the  design  matrix  associated  with  the  MANOVA 
model  and  when  B is  a Ixp  matrix.  Villegas's  model  can  be  called 
the  single  linear  functional  relationship  model  with  replications 
(Moran  [1971],  Madansky  [1959]).  When  F is  the  design  matrix 
associated  with  the  MANOVA  model,  each  column  of  is  the  mean 
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vector  for  a group  of  observations.  In  many  cases  the  number  of 
groups  increases  when  the  sample  size  increases.  This  situation 
is  itself  a special  case  of  the  more  general  case  where  the  number 
of  parameters  increases  as  the  sample  size  increases.  Villegas 
does  discuss  the  consistency  of  his  estimators  when  the  number  of 
groups  increases  with  the  sample  size. 

In  Chapter  1,  we  estimate  the  parameters  in  the  model  and 
hypothesis  specified  by  (0.0.1)  and  (0.0.2).  We  also  give  several 
special  cases  of  our  model,  including  several  models  which  resemble 
a model  discussed  by  Gleser  and  Watson  [1973].  Our  discussion  of 
the  consistency  of  the  estimators  is  directed  mainly  to  cases  when 
the  number  of  parameters  does  not  stay  fixed  as  the  sample  size 
increases . 

One  of  the  biggest  advantages  of  getting  maximum  likelihood 
estimators  is  that  we  can  usually  use  these  estimators  in  deriving 
likelihood  ratio  tests.  For  many  multivariate  problems,  the  exact 
distribution  of  the  likelihood  ratio  test  statistic  is  exceedingly 
complicated.  However  the  asymptotic  distribution  of  -2  log  A,  where 
A is  the  likelihood  ratio  test  statistic  isiusually  a chi-square 
distribution.  In  Chapter  2,  we  use  the  estimators  we  derived  in 
Chapter  1 to  get  the  likelihood  ratio  test  statistic  for  testing 

Hq:  Bn  = aa  versus  : Bn  1 aa. 

Since  the  exact  distribution  of  this  statistic  is  intractable,  we 
find  its  asymptotic  distribution.  Our  results  show  that  the 
asymptotic  distribution  of  the  test  statistic  depends  on  how  the 
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number  of  parameters  increases  with  the  sample  size.  It  is  noteworthy 
that  in  several  cases,  -2  log  A, where  A is  the  likelihood  ratio  test 
statistic, does  not  have  an  asymptotic  chi-square  distribution. 

The  basic  model  discussed  in  the  first  two  chapters  is  commonly 
called  the  classical  multivariate  linear  regression  model.  Another 
type  of  linear  model,  which  has  been  discussed  in  the  literature, 
is  the  "growth  curves"  model  (Cochran  and  Bliss  [1948],  Shrikhande 
[1954],  and  Gleser  and  Olkin  [1964,  1969]).  In  this  model  we  observe 
N independent  pxl  column  vectors  : i = 1,2,...,N,  which  satisfy 


where  F is  a known  pxq  matrix,  = is  an  unknown  q-dimensional 
vector  and  e^  is  a p-dimensional  error  vector.  This  model  has 
been  generalized  by  Gleser  and  Olkin  [1966]  in  their  discussion 
of  k sample  growth  curves. 

All  these  models,  the  classical  multivariate  linear  model 
and  the  growth  curves  models,  can  be  generalized  to  a model  first 
discussed  by  Potthoff  and  Roy  [1964]  and  later  by  Rao  [1965]  and 
Gleser  and  Olkin  [1969].  We  may  write  the  model  which  we  refer  to 
as  the  Potthoff-Roy  model  in  the  following  way: 


(0.0.3)  X = F^Fg+E 

where  X is  a cxN  matrix  of  observations,  F^  and  F^  are  known 
exp  (p  <_  c)  and  mxN  (m  <_  N)  matrices  respectively,  = is  an  unknown 
pxm  matrix,  and  E is  a cxN  error  matrix.  Each  column  of  E is 
distributed  independently  with  mean  vector  0 and  unknown  covariance 


© 


matrix  I. 
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Potthoff  and  Roy  [1964]  gave  ad  hoc  tests  of  the  hypothesis 
(0.0.4)  F3,F4  . tQ 

where  F3,F4  and  are  known  rxp  (r  < p),  mxk  (k  ^ m) , and  rxk 
matrices  respectively.  F-j  and  are  assumed  to  have  full  column 
rank,  and  F^  and  F^  are  assumed  to  have  full  row  rank.  Rao  [1965] 
found  the  conditional  likelihood  ratio  test  of  the  hypothesis 
stated  above,  and  Gleser  and  Olkin  [1969]  showed  that  Rao's  condi- 
tional likelihood  ratio  test  is  actually  the  unconditional 
likelihood  ratio  test. 

In  Chapter  3,  we  work  with  the  Potthoff-Roy  model  (0.0.3)  and 
estimate  parameters  under  a hypothesis  similar  to  (0.0.4).  The 
hypothesis  we  discuss  is  concerned  with  unknown  linear  restrictions 
on  the  regression  coefficient  matrix.  This  hypothesis  can  be 
written  the  following  way: 

(0.0.5)  U1=F/,  = ab, 

where  U-j  is  an  unknown  rxp  (r  <_  p)  matrix,  F^  is  a known  mxk 

matrix,  a is  an  unknown  rxs  matrix,  and  b is  a known  sxk  matrix. 

2 

We  assume  that  the  unknown  covariance  matrix  Z has  the  form  o -I 

c 

2 

where  o is  an  unknown.  In  Chapter  3 we  reduce  the  Potthoff-Roy 
model  and  the  above  hypothesis  (0.0.5)  to  a canonical  form.  We 
also  find  the  MLE's  of  the  parameters  in  the  general  model  (0.0.3), 
(0.0.5)  and  in  the  reduced  model.  As  in  Chapter  1,  we  discuss 
consistency  of  the  estimators  when  the  number  of  parameters  is 
allowed  to  increase  with  the  sample  size. 
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Chapter  4 bears  the  same  relationship  to  Chapter  3 that 
Chapter  2 bears  to  Chapter  1.  In  Chapter  4,  we  derive  the 
likelihood  ratio  test  statistic  for  testing 

Hq:  I^hF^  = ab  versus  H1 : U1  =F4  t ab. 

We  find  the  asymptotic  distributions  of  the  likelihood  ratio  test 
statistic;  these  depend  on  how  the  number  of  parameters  increases 
with  the  sample  size.  In  several  cases,  the  asymptotic  distribution 
is  not  the  usual  chi-square  distribution. 
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CHAPTER  I 

ESTIMATION  OF  UNKNOWN  LINEAR  RESTRICTIONS 
ON  THE  PARAMETERS  OF  THE  CLASSICAL 
MULTIVARIATE  LINEAR  REGRESSION  MODEL 

1 . 0 Introduction 

In  this  chapter,  we  discuss  estimation  of  the  parameters  of 
the  classical  multivariate  linear  regression  model  (Anderson  [1958; 
Chapter  8])  when  an  hypothesis  concerned  with  unknown  linear  restric- 
tions on  the  parameters  is  assumed  to  be  true.  Section  1.1  contains 
derivation  of  the  maximum  likelihood  estimators  (MLL's)  of  the 
parameters;  while  Section  1.2  derives  consistency  properties  of  the 
MLE's.  We  show  that  some  of  the  estimators  are  not  consistent  when 
the  number  of  parameters  in  the  model  increases  with  the  sample  size. 
Several  special  cases  of  our  model  are  discussed  in  Section  1.3 
including  the  multivariate  linear  functional  model  (Madansky  [1959], 
Moran  [1971],  Sprent  [1969],  Villegas  [1961]),  and  models  proposed 
by  Kristoff  [1973]  and  Rao  [1973],  In  all  of  our  special  cases,  the 
independent  variables  in  the  regression  model  are  dummy  variables. 

1 . 1 Maximum  Likelihood  Estimation 
Let  our  model  be: 

xi  = " fi  + ei’  1 = 


(1.1.1) 
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where  each  x.  is  a p-dimensional  vector  of  dependent  variables,  each 
f.j  is  a k-dimensional  vector  of  independent  variables  or  covariates 
(k  >_  p),  = is  an  unknown  pxk  parameter  matrix  of  regression  coeffi- 
cients, and  the  e^ 's  are  p-dimensional  vectors  of  errors. 

We  assume  that  the  e^'s  are  statistically  independent  of  one 
another,  and  have  the  same  normal  distribution  with  mean  vector  0 
and  unknown  covariance  matrix  I.  We  will  be  finding  the  maximum 
likelihood  estimators  (MLE)  of  £,  = and  two  other  matrices  B and  a 
which  satisfy, 

(1.1.2)  B = = aa, 

where  a is  a known  sxk  matrix  (s  <_  k)  (k-s  ^ p),  B is  an  unknown  rxp 

(r<p)  matrix  and  a is  an  unknown  rxs(s<r)  matrix.  We  are  concerned  with 

cases  in  which  either  a has  full  row  rank  or  a is  the  zero  matrix, 

i.e.,we  are  testing  B = = 0,  It  should  be  noted  that  if  a is  not 

the  zero  matrix  and  is  not  full  row  rank,  we  can  reparametrize  so 
that  our  resulting  matrix  will  be  full  row  rank.  We  derive  the 
MLE's  of  the  parameters  when  a is  full  row  rank.  Since  the  proof 
is  similar  (actually  easier)  when  a is  the  zero  matrix,  we  will 
merely  state  the  results  in  this  case.  In  all  of  our  special  cases 
(see  Section  1.3),  a = (1,1, ...,1)  or  a is  the  zero  matrix. 

Anderson  [1951a]  considered  the  above  problem  when  a is  the 
zero  matrix.  His  derivation  of  the  MLE's  uses  Lagrange  multipliers 
and  differentiation  of  the  likelihood  function.  A derivation,  similar 
to  the  one  we  give  when  a has  full  row  rank,  could  be  used  as  an 
alternative  method  of  obtaining  and  verifying  the  MLE's  in  Anderson's 
problem.  We  believe  that  that  derivation  would  be  simpler  and  more 
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intuitive  than  Anderson's.  Since  we  would  not  employ  differentiation, 
we  would  not  have  to  worry  about  saddle  points,  etc.  In  his  paper, 
Anderson  [1951a]  also  gives  methods  of  generating  confidence  intervals 
and  likelihood  ratio  tests  of  various  hypotheses. 

Our  computations  will  be  simplified  greatly  if  we  write  (1.1.1) 


in  the  following  way: 

(1.1.3) 

X = h F + E, 

where 

^ ~ ^ 9 f 2 y * * * ’ ^ ’ 
^ 9^2 ' * * * ^ ‘ 

We  will  call  X the  observation  matrix,  F the  covariate  matrix  and 
E the  error  matrix.  We  will  assume  that  F and  a fiave  full  row  rank. 

Maximizing  the  likelihood  with  respect  to  many  parameters  can  be 
done  in  several  ways.  One  way  is  to:  1)  fix  one  of  the  parameters 
(i.e.  treat  one  of  the  parameters  as  fixed  or  given);  2)  maximize 
the  likelihood  with  respect  to  the  other  parameters  (note:  the  derived 
MLE's  of  the  other  parameters  will  be  functions  of  the  fixed  parameter); 
3)  substitute  the  derived  MLE's  of  the  other  parameters  back  into  the 
likelihood;  and  finally  4)  maximize  the  likelihood  with  respect  to 
the  parameter  that  had  been  fixed.  We  will  be  following  this  method, 
with  B treated  as  the  fixed  parameter. 
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Part  X*  B.  fixed  or  given 

We  now  transform  X into  a form  in  which  the  proper  estimators 
of  the  parameters  are  easy  to  see.  Let  C be  a p-rxp  matrix  which 
satisfies  CC'=  Ip_r  and  CB ' = 0.  Let 


Each  column  of  Z is  distributed  independently  with  a p-dimensional 
normal  distribution  having  covariance  matrix 


where  L is  a NxN-k  matrix  which  satisfies  L'L 
Note  that 


z1 

E(Y)  = E[(?  ) (F 1 (FF1 ) " 
aaF  a 

= ( )(F'(FF')"a 

CsF 

/aa(FF')*  0\ 
\Cs(FF')^  0/ 


L)], 


L), 


and  L'F  = 0. 


Since  (F1  (FF1 ) F,L)  is  an  orthogonal  matrix,  each  column  of  Y is 
independently  normally  distributed  with  covariance  matrix  i p. 

We  now  have  transformed  the  data  X into  a form  in  which  it 
is  easy  to  find  the  estimators.  Let  us  write  the  joint  distribution 
of  Y in  the  following  way: 

(1.1.5)  f (Y)  = f(Y21|Y11)f(Y11).f(Y22[Y12)f(Y12), 

where  Y2i  i Yl  1 ^ indicates  the  conditional  density  of  Y^  given  Y^, 
f(Y-|i)  indicates  the  marginal  density  of  Y^,  etc.  Since  the 
columns  of  Y are  independent  normally  distributed  random  variables, 
all  of  the  densities  in  (1.1.5)  are  normal  densities. 

The  parameters  in  our  transformed  model  are  a,C  = , and  <p.  An 
equivalent  parametrization  is 

a»C=  i ,'l'2i^-| i , and  i>22  ] - ^22-i*;21^11^12' 

We  note  that  in  (1.1.5)  only  f(Y21]Y,1)  depends  on  Ch,  and  only 
f ( Y-j  -j ) and  f (Y2i  ! Y-|  i ) depend  on  a in  their  parameterizations . 

Thus,  we  begin  by  finding  the  MLE  of  C=  assuming  that 

al  ^21  (^11 ) ’ 'hi*  ancl  ^22  1 are  fixed-  We  know 


O 


then  it  is  clear  that  Yp-j  I Y£2 ^ attains  its  maximum  (1.1.6).  We 
may  rewrite  (1.1.5)  to  get 


0-1.8)  f(Y) 


k 


22.1 


| k/^(27i)  (P~r)k/2  f(  Yi  i ) f ( Y22 1 Y21  ^ f(Y22^’ 


with  equality  when  (1.1.7)  holds. 

We  next  maximize  the  right-hand  side  of  (1.1.8)  with  respect  to 
a,  treating  ^kl  1 *0l  ,lp22. 1 as  flxed-  We  know  that 

f(Y  )-  1 -#  tr^"J[(Yiraa(FF,)^)(Yiraa(FF,)2)1] 

11  h]]\k/2^fme 


Using  the  theory  of  multivariate  regression,  we  get 
(1.1.9)  f(Y  ) < l___e"'  tr*ll(VllKVil>. 

" - l*11lk/Wk/2 

-here  H = I-(FF)i  a'(aFF’a')-’  a(FF')i.  Equality  t„  (,.1.9)  occurs 
only  when 

k-1-10)  a = Yn(FF*)«  a'OFF'a*)"1. 


Substituting  (1.1.9)  into  (1.1.8),  we  get 


(1.1.11) 


f ( Y ) < 


-k  tr^](YnMY^) 

"kn  |k/2(2,)pk/2k22Ji1^  fCY22lY12^*f^Y12)- 


We  now  maximize  the  right-hand  side  of  (1.1.17)  with  respect  to 
0 1 keeping  ^22.1  anc*  ^21^11^  ^ 'fixed.  Since 
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(1.1.11)  can  be  written 

' ( e‘4  tr,!'i  i ( Yi  2Yi2+Yi  iMYi  i ' 

f(V)  - l<.1,lN/2(21.)tPk+*N-k)l')/2|,.22.1!k/2f(Vz2lVl2)- 

where  Y22 ! 2 ^ ^oeS  not  clePencl  on  • Using  Lemma  3.2.2  of 

Anderson  [1958],  we  have 

°'1'12)  ftY’  “ i511:N/x(?")lp^‘"-^FT72i<.22.1ik/z  ’ 

where  = (Yi2Y12  + YnMYil )/N* 

Finally,  we  maximize  the  right-hand  side  of  (1.1.12)  with 
respect  to  ^l^'ll  and  ^22  V ^now  that 


f (Y22*  Y12)  e"^  tr^22 . 1 'Y22”^21^11Y12^Y22”^21^11Y12^  ‘ 


I* 


22.1 


k/2 


j^^N72(2;)(N-k)(p-F)/2 


(1.1.13) 


tr^22_ i ( Y22 ( I N-k~Yl 2 ( Yl 2Y1 2 ) 1 Y12^Y22^ 


K'22.1  'N//?(2tt) 


(N-k) (p-r)/2 


with  equality  only  when 

*21^11^  1 = Y22Y12^Y12Y12^  ^ 
Using  Lemma  3.2.2  of  Anderson  [1958], we  have 


f(Y22lY12}  e‘*  N(p'r) 

T.  7vj2  - ,N/2,?  , 

I ^22. 1 I 'f  22 . 1 


(1.1.14) 


TN-k)(p-r)/2  ’ 
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where 


*22.1  ' N (Y22(I_Y12(Y12Yi2)  ^12^22^- 


Combining  (1.1.12)  through  (1.1.14),  we  get 


(1.1.15)  f ( Y)  < 


(;M(Np,/2(|*11||i22  ,|)N/2 


-4  Np 


There  will  be  equality  in  (1.1.15)  if 


C=  =[Y21(FFT^  - (^21*11  )(YiT“a(FF'  )^)  ](FF'  )"2. 
a = Y11(FF,)s  a'(aFF'a')'1, 

*21^*11 ^ 1 = Y22Y12(Y12Y12^ 


^ (Y12Y]2  + Y-,-,  (Ik-(FF’  )s  a 1 (aFF1  a 1 )'  a(FF')*)Y^), 
*22.1  = N ^Y22^IN-k"Y12^Y12Y12)  1 Y12^Y22^ * 


Now  we  go  backwards  and  express  s and  a in  terms  of  X.  After  a 
little  simplification,  using  the  facts  that 

(jh"1  = (B’(BB’)'1  ,C'), 

C'C  = I p-B 1 ( BB ' ) _ 1 B , 

LL1  = IN  - F'(FF')_1F, 


we  obtain 

(1.1.16)  = = XA  - X( IN-AF)X ' B ’ (BX( IN-AF)X 1 B ' ) ~ 1 BX ( A-G ) , 

(1.1.17)  a = BX(F'a,(aFF,a’)"1), 


• ■ 
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where 

(1.1.18)  A = F'(FF')'1, 

G = F'a 1 (aFF'a ' )_1a. 

We  can  also  go  backwards  and  find  z.  When  we  do  this,  we  get 
Z = N-1 (X-=F) (X-=F) ' . 

We  may  summarize  our  results  so  far  in  the  following  theorem. 

Theorem  1.1.1.  When  B is  fixed,  the  MLE  of  a , s,  and  z in  the  model 
given  by  (1.1.2)  and  (1.1.3)  are 

i 

5 = BXF'a1 (aFF'a1 )_1 , 

= = XA-X( IN-AF)XB ' (BX( IN-AF)X 1 B 1 )_1BX(A-G) , 

£ = N_1(X-lF)(X-=F), 
where  A and  G are  given  by  (1.1.18). 

Part  II.  Substitution  of  parameters  back  into  the  likelihood  and 
maximization  wi th  respect  to  B . 

If  we  substitute  the  estimators  of  a,  e,  z given  in  Theorem  1.1.1 
(note:  they  are  functions  of  B)  into  the  likelihood  for  X,  we  find 

that 

(1.1.19)  max  log  f(X)  = - j pN  log  2n-  ^ N log  |zj  - j pN. 

C=,a,<p 

Maximizing  (1.1.19)  with  respect  to  B is  equivalent  to  minimizing  |z| 
with  respect  to  B.  After  simplification  we  get 


where 


9 


(1.1.21)  W =X(In-F'(FF,)"1F)X'  , 

(1.1.22)  T =X(IN-Fla,(aFF,a,)"1aF)X1. 
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Note  that  in  terms  of  MANOVA  concepts,  W can  be  thought  of  as  the 
within  covariance  matrix  and  T as  the  total  covariance  matrix. 

Let  U = N~2  BW:j.  Then  (1.1.20)  becomes 

,,  , _ 1 i,, i |UW*  TW'2  U'  | 

(1.1.23)  | Z I = — | W j J pjrj — ^ . 

For  purposes  of  minimizing  (1.1.23),  we  might  as  well  assume  that 

UU'  = Ir,  for  if  LIU'  doesn't  equal  the  identity  matrix,  there  exists 

an  invertible  matrix  H such  that  U*  = HU  also  minimizes  (1.1.23) 

and  U*U*'  = I . 

r 

If  UU'  = Ir,  Theorem  10,  page  129  of  Bellman  [1970]  tells  us 
that  the  minimum  value  of  Z is 


0.1.24)  | E I * -Z  |U|  • ( Wl  '•••' Vr+l*’ 


where  is  the  ith  largest  eigenvalue  of  W 2 T W 2.  Let  r'  be  a 

matrix  whose  columns  are  the  eigenvectors  associated  with  the  r 

_x  _ i 

smallest  eigenvalues  of  W~:;  TW  2.  If  we  choose  U to  be  r,  then 
the  right-hand  side  of  (1.1.23)  achieves  the  minimum  value  of  |z| 
as  seen  in  (1.1.24).  Thus,  if  we  let 


(1.1.25) 


. _A  _A 

B = N 2 rw  2, 


then  the  likelihood  function  is  maximized.  It  is  easy  to  show  that 
the  columns  of  B'  are  themselves  eignevectors  of  W’1  T corresponding 


to  the  r smallest  eigenvalues  of  W T. 


O 
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We  summarize  our  results  in  the  following  theorem. 

V 1 

Theorem  1.1.2.  The  MLF  of  B,  a,  e,  and  E in  the  model  given  by 
(1.1.2)  and  (1.1.3)  assuming  a and  F are  full  row  rank  are: 

a = §XF*a 1 (aFF*a ’ )_1 , 

I j 

I = X(F,(FF,)"1)-WB(BWB')-1(BX(F'(FF,)"1-F'a,(aFF,a')"1a)), 

E = N-1 (X  - iF)(X  - iF) ' , 

where 

W =X(In-F,(FF,)'1F)X’  , 

T =X(IN-F,a’(aFF,a,)'1aF)X' , 

( 

i 1 

and  the  columns  of  B'  are  the  eigenvectors  corresponding  to  the  r 
smallest  eigenvalues  of  T. 

i 

Remark  I.  If  we  multiply  § on  the  right  by  any  invertible  matrix,  , 

the  resulting  matrix  also  maximizes  the  likelihood  since  if  B*  = HB, 

| H | i 0,  then 

! B*TB* ' 1 _ IHBTB'H' ! _ | H 1 • j BTB 1 j • ! H ' ; = j BTB 1 | 

! B*WB* ' I (HfJWB'H'l  | H | - | BWB ’ [ - j H ' | ' ]BWB'| 

Remark  II.  All  matrices  which  maximize  the  likelihood  are  of  the 

* 

form  HB  for  some  invertible  H.  We  will  not  prove  this,  since  a proof 
of  the  assertion  is  straightforward. 

Remark  III.  We  have  been  assuming  that  F has  full  row  rank.  We  now 
demonstrate  how  to  reparametrize  so  that  the  results  in  Theorem  1.1.2 
can  be  applied  when  F is  not  of  full  row  rank.  Assume  c(c<k)  is  the 

. . , 
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rank  of  F,  and  c-s  >_  p.  Let 

(1.1.26)  F = (r]r2)(°°)u. 

The  right-hand  side  of  (1.1.26)  is  the  Eckart-Young  decomposition 
where  U and  (r-i^)  are  orthogonal  matrices  and  D is  a diagonal 
invertible  cxc  matrix.  Now 

=F  = =(r^r2)(gg)U, 

= (=r1)(o,o)u, 

= =*(D,0)U  = i*F*, 

where  =*  = and  F*  = (D,0)U.  Since  F*  is  full  row  rank,  we  may 
use  Theorem  1.1.2  to  get  the  MLE's  of  the  parameters.  If  1*  is  the 
MLE  of  =*,  we  have 


= = (=*,P)(r1)’ 

‘2 

where  P is  any  finite  pxk-c  matrix.  Usually  when  F is  not  of  full 
row  rank  there  are  restrictions  on  =.  We  can  pick  P so  that  i 
satisfies  those  restrictions. 

We  now  state  a theorem  which  gives  us  the  MLE's  for  our  model 
when  a is  the  zero  matrix: 

Theorem  1.1.3.  The  MLE  of  B,  =,  and  F.  in  the  model  given  by 
(1.1.2)  and  (1.1.3)  when  a is  the  zero  matrix, i .e. ,(1 . 1 .2)  becomes 


B=  = 0 are: 
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5 = X(F* (FF 1 )_1 ) - WB(BWB ‘ ) ~ 1 ( BX ( F • ( FF * ) _1 ) ) , 

Z = N-1(X-’'f)(X-eF):, 

where 

W = X(In-F* (FF* )_1F)X* , 

T = XX1  , 

and  the  columns  of  B‘  are  the  eigenvectors  corresponding  to  the 
r smallest  eigenvalues  of  W“^T. 

Let  us  now  consider  the  model  of  Theorem  1.1.2  with  one  change  - 
namely,  instead  of  assuming  that  each  e^  is  independently  normally 
distributed  with  common  covariance  matrix  Z , we  now  allow  the  e^’s 
to  be  jointly  normally  distributed  with  mean  vector  0 and 

(1.1.27)  cov(e.,e.)  = k.-  z , 

where  K = (k..)  is  a knov/n  invertible  matrix.  The  maximum  likelihood 

* J 

estimators  of  a,  B,  = , and  l are  easy  to  compute,  using  Theorem  1.1.2 
and  the  following  lemma: 

_ X 

Lemma  1 . Let  Z = XK~2  (X  comes  from  our  new  model),  then 

E(Z)  = -rFK  2 and  each  column  of  Z is  independent  with  a p-dimensional 

normal  distribution  having  covariance  matrix  l. 

Proof.  Since  Z is  a linear  combination  of  normally  distributed 
random  variables,  it  is  itself  normally  distributed.  Further, 

E(Z)  = E(XK"2)  = (E(X) )K~2  = =FK~2. 

Let  (m^)  = K'2.  Then 
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1.2  Consistency  of  the  Estimators 


As  the  number  of  observations  gets  large,  it  is  important  to 


know  what  our  estimators  converge  to.  In  most  statistical  problems 


the  number  of  parameters  stays  fixed  as  the  sample  size  increases. 


However,  in  this  section  we  will  be  finding  out  what  our  estimators 


from  Theorem  1.1.2  converge  to  when  the  number  k of  columns  of  z 


is  allowed  to  increase  with  the  sample  size.  The  elements  of 


our  z matrix  are  what  Neyman  and  Scott  [1948]  have  called 


"incidental  parameters".  When  there  are  incidental  parameters 


present,  some  estimators  (as  in  our  case)  may  turn  out  to  be 


inconsistent.  We  will  not  discuss  the  consistency  of  the  estimators 


in  Theorem  1.1.3  or  Theorem  1.1.4  since  it  is  clear  that  we  have 


analagous  results.  In  our  discussion,  p (the  dimension  of  the 


dependent  variable),  r (the  row  rank  of  B)  and  s (the  column  rank 


of  a)  are  assumed  to  be  fixed.  It  is  evident  that 


t - lim  ^ 

N-**  n 


is  a measure  of  how  fast  the  number  of  parameters  increases  with  the 


sample  size,  N.  We  will  assume  that  t is  always  greater  than  zero 


and  less  than  or  equal  to  one.  If  the  number  of  parameters  stays 


fixed,  t will  equal  one.  We  will  be  concerned  with  the  consistency 


of  B,  a,  and  E.  We  will  first  discuss  the  consistency  of  B and  a. 


In  order  to  make  a discussion  of  the  consistency  of  B,a 


meaningful,  we  will  have  to  place  restrictions  on  B and  B which 


will  make  these  matrices  unique.  It  should  be  remembered  that  if 


B ,ct  maximize  the  likelihood,  then  so  do  HB,  Hu  where  H is  an 


O 


invertible  matrix.  In  fact  all  MLE  of  B,a  will  be  of  the  form 
HB,  Ha  for  some  invertible  matrix  H.  Similarly  (B,u)  satisfy 
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(1.2.1)  Be  = aa, 

if  and  only  if  HB,Ha  satisfy  HBs  = Ha,  where  H is  an  invertible 
matrix. 

Let  B ,a  be  a pair  of  matrices  which  satisfy  (1.2.1).  By 
requiring  B to  satisfy  a number  of  restrictions,  (B,a)  will  be  the 
unique  matrices  which  satisfy  (1.2.1).  We  will  show  that  if  B and  a 

I A 

are  MLE  of  B and  a,  and  if  B satisfies  the  same  restrictions  as  B, 
then  B,a  converge  almost  surely  to  B,a.  We  will  be  showing  the 
above  for  only  one  particular  set  of  restrictions.  However,  it  is 
clear  that  if  one  set  of  MLE  (B-j,a^)  converge  almost  surely  to 
B-j  ,a-j , where  B-j  and  B-j  satisfy  one  group  of  restrictions,  then 
any  other  set  of  MLE  B^.o^  will  converge  almost  surely  to  B2,a2, 
where  and  32  satisfy  another  group  of  restrictions,  provided 
the  respective  restrictions  make  B-j  and  B2  unique. 

Let  B,a  be  a set  of  matrices  which  satisfy  (1.2.1)  and  let 

B*  = (B‘1B1,Ir)  = B21(B1,B2)=  B"  1 (B) , 

a*  = B^a, 

where  B = (B-j,B2),  B-j : rxp-r,  and  B2:  rxr.  B*  is  the  only  matrix 
with  its  last  r columns  being  the  identity  which  satisfies  (1.2.1). 
Similarly,  if  B-psare  maximum  likelihood  estimators,  we  can  generate 
another  set  of  maximum  likelihood  estimators  B*,a*  where  B*  has  the 
identity  matrix  as  its  last  r columns: 


O 
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B*  = B^1  (B1,B2)  = B-1B, 
a*  = B2^a. 

Hence,  B*  is  the  only  MLE  of  B which  has  the  identity  for  its 
last  r columns.  We  will  show  that  B*,a*  converge  almost  surely 
to  B*,a*. 

Lemma  1 . If  N-k  -*•  °°  then  (N-k)-1W  goes  almost  surely  to  Z. 

Proof.  Recall  that 

W = X( I^-F 1 (FF ' ) -1 F)X ‘ , 

= (hF+E)(In-F1(FF')'1F)(=F  + E) • , 

= E( I N - F * (FF ' )-1 F) E : . 

Each  column  of  E has  an  independent  normal  distribution  with  mean 
vector  0 and  covariance  matrix  Z.  By  Theorem  4.3.2  in  Anderson 
[1958],  W is  distributed  the  same  way  as 
N-k 

J, »,  «i- 

where  u.  are  independent  N(0,z)  random  variables.  We  can  conclude 

that  , N-k 

(N-k)-1  E u.  u: 

converges  almost  surely  to  Z.  Therefore  (N-k)  1 (W)  goes  almost 
surely  to  Z.  Q.E.D. 

Lemma  2.  Let  be  independent  identically  distributed  random 

variables  with  means  0 and  common  finite  variances.  Let  b be  any 

n 


m — ■■ 
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array  of  real  numbers  m _<  n n = 1,2,...  satisfying 
n 2 

lim  ] b = v,  0<v<“, 

_ nm  ’ 

n IK- 1 

.a  0 

then  n 2 j bzm  qoes  to  0 almost  surely. 

nm  m 3 J 

m=  I 

Proof.  The  proof  is  in  Chow  [1966].  Q.E.D. 

Lemma  3.  Assume  that 

R = lim  N_1  =F(IN-F'a,(aFF,a,)"1aF)F,=' 

N-*° 

exists  and  is  finite,  then 

(1.2.2)  N'1  E(IN-F'a,(aFF,a,r1aF)F,E' 
goes  almost  surely  to  zero. 

Proof.  Consider  the  i.jth  element  of  (1.2.2).  That  element  is  the 

_JL 

product  of  the  ith  row  of  N"2  E and  the  jth  column  of 

(1.2.3)  N^(IN-F,a,(aFF,a,)“1aF)F,='. 

Each  element  in  the  ith  row  of  E is  independent  with  mean  0 and 
common  variance.  The  sum  of  the  squares  of  the  elements  in  the 
jth  column  of  (1.2.3)  is  the  j,jth  element  of 

N'1  =F(IN-F,a,(aFF,aT1aF)F'H'. 

By  our  hypothesis,  this  element  converges  to  something  finite  as  N 
goes  to  infinity.  By  Lemma  2,  the  i.jth  element  of  (1.2.2)  goes 
almost  surely  to  zero.  Q.E.D. 
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Lemma  4.  Assume  that  R (as  defined  in  Lemma  3)  exists  and  is  finite,  then 
N~^T  goes  almost  surely  to  R+z. 


Proof.  Recall  that 

N-1T  = N’^XU^-F'a'laFF'a'^aFjX'  , 

= N'1(HF+E)(IN-F,a,(aFF'a,)-1aF)(=F+E)', 

(1.2.4)  = N"1EF(IN-F'a,(aFF'a,r1aF)F,5,+rr1E(IN-F,a'(aFF,a,)'1aF)F,H,+ 

N'^FU^F'a'UFF’a'paFjE'+N^EO^F'a'faFF'a'rW^' . 

By  our  hypothesis  the  first  term  in  (1.2.4)  converges  to  R.  By 
Lenina  3,  the  second  and  third  terms  go  almost  surely  to  zero.  If 
we  use  Theorem  4.3.2  in  Anderson  [1958],  we  find  that  the  fourth  term 
in  (1.2.4)  has  the  same  distribution  as 
N-s 

y u . u ! , 
i=i  1 1 


where  u^  has  a normal  distribution  with  mean  vector  0 and  covariance 
matrix  l . u.  and  ui  are  independent  if  i f j.  We  know  that 


N-s 


, J 

(N-s)"  ^ uqu^  goes  almost  surely  to  £ . Since  s is  fixed  as  N 


a.s. 


1=1  -1  1 
goes  to  infinity  we  have  that  N E(I^,-F'a  1 (aFF'a  1 ) aF) E ' 

Using  all  of  the  above  arguments,  we  have  N""*T  goes  almost  surely 

to  R+Z . Q.E.D. 


Lemma  b.  The  columns  of  B*' 
to  eigenvalue  one. 


are  eigenvectors  of  I -1 

P 


-1 


R corresponding 


Proof.  We  know  that  for  every  N 
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N_1=F(IN-F,a' (aFF'a* ) _1 aF) F * = * B* * = 
N"15F(IN-F,a,(aFF,a')"1aF)F,a'a*'  = 0. 

Because  of  the  above  P-B* ' =0.  We  therefore  have 

(Ip-E_1R)B*'  = B*'.  Q.E.D. 

Theorem  1.2.1.  Under  the  assumptions  of  Lemma  4 and  assuming  R is 
of  rank  p-r,  B*  is  a strongly  consistent  estimator  of  B*. 

Proof.  By  Lemma  1,  we  have  (N-k)“^W  a-vs*  E . By  Lemma  4, 

"1  a c 

NT  -v  • R+e.  Combining  these  statements  we  get 
((N-k)W)"1(ir1T)  a+s‘  E_1(e+R)  = Ip+E-1R. 

N-k 

Since  Tim  -n—  = t > 0,  we  have 

N-w 

W-1(T)  a-^s-  (l/t)(I  +E_1R). 

Since  the  eigenvalues  of  a matrix  are  continuous  functions  of  the 
elements  of  that  matrix,  the  eigenvalues  of  W’^(T)  coverge  almost 
surely  to  the  eigenvalues  of  l/t(Ip+E-,R).-  Since  R is  positive 
semidefinite  of  rank  p-r  and  E is  positive  definite,  the  smallest 
eigenvalue  of  l/t(Ip+E“'R)  is  1/t.  It  has  multiplicity  r.  The 
r smallest  eigenvalues  of  W'^T)  must  go  almost  surely  to  1/t. 

Let  B^  be  the  estimator  of  B*  if  we  have  N observations.  Let  B^  be 

A 1 A A 

the  estimator  given  in  Theorem  1.1.2  (B^  satisfies  N~  B^WB^  = Ir) 
used  to  generate  B*  , i.e., 
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R*  = (B^2'  ^ T ) - (r  1 

bn  ibn  bn  »V  bn  (Bn  >bn  } ' bn  lV* 


where  = (B^,B^).  Because  N'^B^WB^'  = 1^  and  N’^W  converges 
almost  surely  to  t-z,  B^  is  bounded  almost  surely.  Let  us  pick 
any  subsequence  of  B^.  Since  B^  is  almost  surely  bounded,  there 
must  exist  a subsequence  of  this  subsequence  which  converges.  Let 
B denote  the  convergent  subsequence.  Let  C be  defined  by 

lim  B = C. 

N-*»  "N 


Every  column  of  C'  is  the  limit  of  a sequence  of  eigenvectors  of 
W""*(T)  associated  with  an  eigenvalue  which  goes  almost  surely  to  1/t. 
Since  W~^(T)  converges  almost  surely  to  1/t  (I+z-1R) ,each  column  of 
C must  equal  some  eigenvector  of  l/t(I+z"^R)  associated  with 
eigenvalue  1/t.  Since 

lim  (W)B'  = tCEC'  = I , 

N-»»  ' N N r 

C is  of  full  row  rank.  C must  span  the  space  of  eigenvectors  of 
(l/t)(I+Z-1R)  associated  with  1/t.  By  Lemma  5,  8*  also  spans  this 
space.  Therefore  there  exists  an  invertible  matrix  V such  that 

B*  = (B^  V^.I)  = VC. 

If  C = (C^\c^),V  must  equal  (C^)"1  and 

B*  = (C^2))-1C. 

Let  |[Aj|  denote  the  largest  value  of  any  element  in  A.  We 


know  that 


r 
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I |B*  -B*i | = | |(bJ2))"1-B  -(C(2))-1C||, 

N N *N 

! |B*  -B*||  < I I (B^2))_1B  -(C(2))-1Bti  II  + I |(C(2))_1B  -(C(2))_1C||, 

N N N "n  N 

(1.2.5) 

I |B*  -B*| | < ||(B,  )||  ||(B<2)r'-(C(2))‘'||  ♦ N(B<2))''||  1 1 B -c||. 

' N N N N 

The  first  term  on  the  right-hand  side  of  (1.2.5)  is  arbitrarily  small 

since  ||B  j|  is  almost  surely  bounded  and  B differs  from  C by  an 

arbitrarily  small  amount  when  N is  large.  The  second  term  vanishes 
(21-1 

since  (Cv  ;)  is  bounded' and  B goes  almost  surely  to  C.  We 

nN 

therefore  have  that  B*  goes  almost  surely  to  B*.  We  have  shown  that 

"N. 

for  any  subsequence  of  B*,  there  exists  a subsequence  of  that 
subsequence  which  converges  to  B*  almost  surely.  B*  must  converge 
almost  surely  to  B*.  Q.E.D. 

Theorem  1 .2.2.  If  N(aFF'a')"^  converges  to  a matrix  with  all 
elements  finite  then  a*  is  a strongly  consistent  estimate  of  a*. 

Proof.  Note  that 

a*  = B*XF 'a 1 (aFF'a ' )_1 , 

= B*( eF+E) (F 'a ' (aFF'a1 )-1), 

(1.2.6)  = B*HFF,a'(aFF'a,)"1+B*E(F,a'(aFF,a1)"1. 

Since  B*  goes  almost  surely  to  B*,  the  first  term  on  the  right  of 

(1.2.6)  goes  almost  surely  to 

B*  = FF ' a ' (aFF'a*  )“'*  = a*aFF'  a '(aFF'a1)"1  = a*. 


A 
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By  applying  Lemma  2 in  a way  similar  to  what  we  did  in  Lemma  3, 
we  know  that  N ^(aFF'a1)  ^ converging  to  a finite  matrix  implies 
that  E F'a'(aFF'a')  ^ goes  almost  surely  to  zero.  We  can  conclude 
that  B*EF‘ a ' (aFF 'a  1 ) ^ converges  almost  surely  to  zero.  Q.E.D. 

We  now  must  discuss  the  consistency  of  E.  It  should  be  noted 
that  the  MLE's  of  = and  E are  unique;  they  do  not  depend  on  the 
choice  of  MLE  of  B and  a.  Because  of  this,  we  will  use  B*  as  the 
MLE  of  B and  a*  as  the  MLE  of  a.  We  have  seen  that 

E = N'^X-eFMX-hF)1  , 

= N_1(X-XF,(FF,)"1F+WB*(B*WB*,)'1(B*X(F’(FF,)"1F  - 

F’a ' (aFF'a ' )-1aF) * 

(X-XF1 ( FF ' )-1F+WB*(B*WB*' )_1 (BX(F' (FF1 )”] F - 

F'a ' (aFF'a ' ) -1 a F ) ' . 

After  a little  simplification  which  uses  the  definitions  of  W and  T, 
we  get 

E = N‘1W+N_1WB*(B*WB*')"13*(T-W)B*’(B*WB*,)'1B*,W. 

From  our  previous  lemmas  and  theorems  we  know  that 


, a.s. 
N~  W f 

t-E, 

- a.s. 

B*  -*  B* 

, a.s. 

n"'t  -*■ 

E+R, 

RB*'  = 0. 

Using  the  above  we  have 

E - t-E+EB*'  ( B*;:B* ’ ) lB*(E+R-tE)B*'(B*EB*')  V’E 

B*EB*' (B*(E+R)B*' )_1B*'E, 

= t-E+(l-t)(EB*(B*EB*TV'E). 


Since  the  above  expression  is  valid  regardless  of  which  B in  the 
class  of  B's  which  satisfy  Bi  = aa  we  take,  we  have  the  following 
theorem: 
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Theorem  1 .2.3.  If  we  assume  the  conditions  given  in  Lemmas  3 and  4 
and  in  Theorem  2.2.2,  then  £ goes  almost  surely  to 

(1.2.7)  t-£  + (l-t)EB' (BEB* )”1B,Z. 

The  most  startling  thing  about  the  above  is  not  that  £ is  not 
a consistent  estimate;  when  the  number  of  parameters  gets  large,  the 
estimate  of  the  covariance  matrix  is  usually  inconsistent.  What 
makes  the  above  unusual  is  the  fact  that  the  matrix  £ goes  to  is  a 
function  of  B.  The  second  term  in  (1.2.7)  is  very  unusual. 

We  can  not  discuss  the  consistency  of  s,  since  it  is  not  a 
fixed  matrix  of  parameters.  It  is  interesting  to  consider  to  what 

N"1=F(I-F'a’(aFF,a,)‘1aF)F,= 

converges  almost  surely.  We  might  expect  it  to  converge  almost 
surely  to  R as 

N'1HF(I-F,a,(aFF,a,)'1aF)F’= 

does.  However,  if  we  went  through  a proof,  we  would  firJ  it  actually 
goes  almost  surely  to 

R+(l-t)£  +-(l-t)£B' (B£B' )_1B'£. 
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1.3.  Special  Cases 

Special  cases  of  the  models  we  discussed  have  come  up  many  times 
in  the  literature.  We  will  be  discussing  cases  when  the  F matrix  has 
the  following  form: 

(1  1 ...  1 0 0 ...  0 ...  0 0 ...  0 

0 0 ...  0 1 1 ...  1 ...  0 0 ...  0 

6 6 ...  6 6 6 ...  6 ...  i i ...  i 

If  the  F matrix  has  the  above  form,  our  additional  information 

consists  of  knowing  some  of  the  observations  come  from  the  same 
mean, i.e.,we  have  replications  at  each  mean.  The  model  could  be 
written  this  way: 


0.3.2)  x,j  - {.  <■  etj 

= = Uvc2’- 


1 ,2, . . . ,k,  j 1 ,2, . . . ,n^ , 

5k); 


E " ^en ,ei2’ 


,e 


in 


1 


e 


kn 


)• 

k 


Note:  In  all  of  our  special  cases, 


N = l n.,  x 
i=l  1 


We  wilt  need  the  MLE's  in  the  following  two  cases.  The  first 
case  specifies  that  the  set  of  mean  vectors  is  in  a lower  (p-r) 
dimensional  space  passing  through  the  origin: 


(1.3.3)  Bf.  = 0,  V.. 

The  second  case  specifies  that  the  set  of  mean  vectors  is  in  a 


I 

i 

; 

• 

i 

\ 


r T 
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lower  dimensional  space  which  can  pass  through  any  point: 
(1.3.4)  Be,-a,vi. 

For  the  first  case,  we  will  apply  Theorem  1.1.3  with  F as 
defined  by  (1.3.1).  Our  result  is: 


Application  1.  When  our  model  is 


x^j  * 5.j+e.j  j , i “ 1 ,2, . . . ,k;  j — l,2,...,n^, 

B£.  = 0; 


then  the  MLE  of  B,  and  >1  are 


= xi-WBtBWB,)“1Bx1, 


J = N''  T,  jzi(xfr’5i)(xir{i)’’ 


k ni 


where 


k ni 


W=  l l (x11-xi)(x11-xi),# 
i=l  j=l  1J  1 1J  1 


k ni 

T»  l l (xij)(x.j), 
i=l  j=l  13 

and  the  columns  of  B are  the  eigenvectors  corresponding  to  the 
r smallest  eigenvalues  of  W"  T. 

For  the  second  case  we  can  apply  Theorem  1.1.2  with  a = (1,1,. 
and  F as  defined  by  (1.3.1)  to  get: 


• 1) 


Application  2.  When  our  model  is 


o 


ij 


= C.j+e.jji  i = 1 ,2 , . . . ,k ; j 1,2,...,  n^.; 


BC.  = a; 


A 
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then  the  MLE  of  B,  a,  » and  z are 


a = Bx, 

^ = xi-WB(BWB,)"1B(x.-x), 


k ni 


J ’ N_1  T, 


where 


k ni 


and  the  columns  of  B'  are  the  eigenvectors  corresponding  to  the  r 
smallest  eigenvalues  of  W‘^T. 

The  model  in  Application  2 is  the  same  model  Rao  [1973] 
considers  when  he  talks  about  a test  for  dimensionality.  His  test 
of  dimensionality  is  a test  of  the  hypothesis  that  B£.j  = a versus 
the  hypothesis  B^  t a.  His  test  statistic  turns  out  to  be  similar 
to  the  likelihood  ratio  test  statistic  although  he  neither  mentions 
nor  proves  this.  He  does  find  the  likelihood  ratio  test  when  z is 
known. 

Villegas  [1961]  considers  both  Application  1 and  2 - the  first 
of  which  he  calls  a homogeneous  linear  functional  relationship.  All 
of  Villegas's  results  are  only  valid  when  we  are  talking  about  a 
single  functional  relationship,  i.e.,  B is  a row  vector.  Through 
geometrical  arguments  similar  to  the  techniques  used  by  Max  Van  Uven 

iS  * 

[1930]  who  derived  estimates  of  B and  s when  z is  known,  Villegas 


derived  maximum  likelihood  estimatorswhich  agree  with  Anderson's 
and  ours.  B turns  out  to  be  the  eigenvector  associated  with 
the  smallest  eigenvalue  of  W“^T.  Villegas  also  talks  about  cases 
in  which  Theorem  1.1.4  appl ies, i .e. .when  K / 1^.  He  shows  that 
the  covariance  matrix  has  the  form  needed  in  Theorem  1.1.4  when 
it  arises  from  certain  experimental  designs  (mainly  incomplete 
block  designs.)  Since  our  results  are  valid  when  B is  any  rank 
(<_  p-1),  our  results  can  be  thought  of  as  extensions  of  Villegas's 
results  for  a single  functional  relationship. 

We  can  give  another  application  which  fits  directly  into  a 
one-way  analysis  of  variance.  Let  our  model  be 


xij " "ni+eir 


where  p is  the  unknown  grand  mean.  We  will  make  the  common 
k 

assumption  that  l £.n.  = 0.  We  will  be  fitting  parameters  under 
1 = 1 1 1 

the  hypothesis  that. 


BS.  = 0. 


It  should  be  noted  that  Bf.  can  not  equal  anything  but  zero  when 
ZC^n^  = 0.  The  MLE  pf  u is 

p = x. 


If  we  substitute  u into  the  likelihood  we  have  exactly  the  same 
maximization  problem  that  is  solved  in  Theorem  1.1.3  except  that 
we  will  use 

X - X -x ( 1 ,1 ....  ,1 ) - ( x*| i — x , x-j2‘,"X>***>X|^  -x), 


instead  of  X.  If  we  use  X*  and  F as  defined  by  (1.3.1)  in 
Theorem  1.1. 3, we  get  the  following  application: 
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Application  3.  When  our  model  is 

xij  = VJ+^i  +ei j » 1 = 1.2,.  ...k;  j = 1,2,.  ...n^ 

BCj  = 0; 

then  the  MLE's  of  y,  B,  £ . and  z are 


JJ  X 9 


Ci  = x-x  - WB'(BWB,)'1B{xi-x), 


* = N_1  l l (xii-5i)(x1i-Ci)' 

i=l  i=l  1J  1 1J  1 


k "i 


where 


k "i 


w = I I (xirxi)(xii-x.)-, 
1=?  j=1  1J  1 1 


k ni 


T = l l (xi  --x) (x- ,-x)' , 
1*1  j=l  1J  1J 


and  the  columns  of  B'  are  the  r eigenvectors  corresponding  to  the 
r smallest  eigenvalues  of  W’H. 

The  model  considered  in  Application  3 is  a generalization  of 
the  model  given  by  Kristoff  [1973].  Kristoff  gives  an  ad  hoc  goodness 
of  fit  test  for  his  model  which  is  actually  equivalent  to  likelihood 
ratio  test  statistic. 

In  all  applications  so  far,  the  estimate  of  B which  was  given 
and  which  maximizes  the  likelihood  was  unique  only  up  to  multiplication 
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on  the  left  by  a nonsingular  matrix.  By  picking  a unique  member 
from  the  class  of  maximum  1 i kel i hood  estimators  as  we  did  in  our 
section  on  the  consistency  of  the  estimators,  we  will  show  another 
class  of  models  can  be  handled  with  our  method. 

Consider  the  following  model: 


(1.3.5)  Y.j  = v.+nu  j ; i = l,2,...,k;  j = l,2,...,n.  ; 
2lj  * Hvi  + 


where  y — and  are  p-r  and  r dimensional  vectors  of  observations 

respectively,  v.  is  a p-r  dimensional  parameter  vector,  H is  a 

m.  . 

unknown  rxp-r  parameter  matrix,  and  ( 1J)  is  the  error  term.  We  will 

9ij 

be  trying  to  estimate  v.  and  H.  The  most  reasonable  assumption 

(according  to  Acton  [1959])about  the  distribution  of  the  errors 
mii 

is  that  each  ( J)  have  a joint  normal  distribution  with  mean  0 

9ij 

and  unknown  covariance  matrix  I.  Errors  arising  from  different 
observations  are  independent  of  each  other.  We  will  now  show  that 
our  new  model  (1.3.5)  is  just  another  application  of  the  model  in 

Theorem  1.1.3. 

If  we  let 


yu  m..  v. 

x = ( p = r r = ( 1 ) 

ij  'zij  ’ '•j  '9ij/5  * Hvi 


our  new  model  (1.3.5)  can  be  rewritten  as 


x. . = K -+e-  ■ . 


We  also  have  a side  condition  that 
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(H,  1)5,  - 0,  v,. 

This  formulation  of  (1.3.5)  is  very  similar  to  Application  1,  the 
only  difference  being  that  now  we  want  the  last  columns  of  B to 
form  the  identity  matrix  as  we  did  in  the  section  on  the  consistency 
of  the  estimators.  If  we  let 

B*  = (B^BpI)  = Bg^ByBg)  = B2(B), 

where  B = (B, ,B2)  is  the  estimate  of  B from  Application  1,  then  B* 
is  the  only  matrix  with  the  correct  form  which  maximizes  the 
likelihood.  B2  will  be  invertible  with  probability  oneihowever 
if  it  is  close  to  being  singular  (one  of  its  eignevalues  is  very 
small), our  results  will  be  misleading.  It  would  indicate  that  there 
is  a strong  internal  relationship  between  the  p-r  variables  composing 
y^j.  Since  B*  is  the  only  matrix  of  the  correct  form  which  is  a 
maximum  likelihood  estimate,  - (Bv  ')  Bv  ' must  be  the  maximum 
likelihood  estimate  of  H.  From  Application  1 we  can  also  get  the 
MIE  of  Sj  and  X.  Since  v^  is  the  top  p-r  rows  of  ? ^ , we  have  the 
MLE  of  v^.  If  we  summarize  the  preceding  statements , we  have: 


Application  4:  If  our  model  is  given  by 

yi j - vi+mi j ; i = 1 ,2,. . . ,k;  j = l,2,...,n. 


where  y.j,  v^ , m^,  z.j,  H and  g.,  are  defined  in  the  paragraph 
following  (1.3.5),  then  the  MLE  of  H,  v-  and  F are  given  by 


r 


o 
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O 


H - (6<2>r>5(1>, 

»‘,  = yrW  3'(B«B')‘1B(ji), 


n. 


* - n_1  i rr2ij.fii  )fzij  h l )» 

1*1  j=l  Zij  Vi  Zi j~Hvi 


1 y . .-v . y . .-v . 

, '-'11  'I  -'ll  T 


where 


n . 


k "1  V. . y . y . . y . 

w - I Z - 7%1J  - 51)1. 

i ij  i 


i=l  j=l  zij  z^  z^  Zi 

k lx.,  y . 

IT.  . *'  1 


n . 


T -ll  (Z1J)(Z1J)', 
1=1  J = 1 ZU  Zi  j 


- -r  -j  \ ~(2) 

and  the  columns  of  B'  = (Bv  ,B'  ')'  are  the  eigenvectors  associated 
with  the  r smallest  eigenvalues  of  W"^T. 

Remark;  Application  4 is  very  similar  to  a model  discussed  by 
Gleser  and  Watson  [1973].  In  Chapter  3,  we  will  be  discussing 
models  which  are  generalizations  of  Gleser  and  Watson's  model. 

We  could  make  minor  alterations  on  the  model  we  just  discussed. 
For  instance,  we  could  estimate  parameters  in  the  following  model: 


0 


■ w 

Zij  = HVa+9ij’ 

where  all  the  terms  (except  a)  are  defined  in  the  previous  application. 
The  maximum  likelihood  estimates  can  be  derived  from  Application  2 in 
a manner  analagous  to  the  way  we  derived  the  estimates  for  Application 
4 from  Appl ication  1 . 

Similarly  Application  3 could  be  extended  to  cover  the  following 
model : 


m 
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Ik 


where  all  the  terms  (except  p-j  and  p^)  are  the  same  as  in  Application 

1. 


r 
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CHAPTER  2 

TESTING  THE  EXISTENCE  OF  UNKNOWN  LINEAR  RESTRICTIONS 
IN  THE  CLASSICAL  MULTIVARIATE  LINEAR  REGRESSION  MODEL 


2.0  Introduction 

Let  our  model  be  the  same  model  we  considered  in  Chapter  1: 
(2.0.1)  X = =F  + E, 


where  X is  a pxN  matrix  of  observations,  is  an  unknown  pxk(p<k^N) 
parameter  matrix,  F is  a known  kxN  matrix  of  covariates,  and  E is 
a pxN  matrix  of  errors.  We  assume  that  each  column  of  E is 
independent  of  any  other  column.  We  also  assume  that  each  column 
of  E has  a normal  distribution  with  mean  vector  0 and  unknown 
covariance  matrix  l.  In  this  chapter  we  are  concerned 'with  testing 


(2.0.2)  Hq:  Bh  = aa  against  H^ : B:  / ad. 


where  B is  an  unknown  rxp  matrix,  a is  an  unknown  rxs(s<r<p)  matrix,  a 
is  a known  sxk  matrix.  We  will  derive  results  when  a is  of  full  row 
rank.  For  the  case  a is  the  zero  matrix,  i.e.,  when  we  test 
B=  = 0 versus  Be  i 0,  we  will  merely  state  our  results  since  in  this 
case  all  results  can  be  derived  in  a way  analagous  to  the  case  when  a 
is  of  full  row  rank. 


o 
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In  Section  2.1  we  will  find  the  likelihood  ratio  test  statistic 
of  Hq  versus  H-j , and  mention  similarities  to  test  statistics  of 
Rao  [1S65]  and  Kristoff  [1973].  Section  2.2  will  contain  a discussion 
of  the  asymptotic  distribution  of  the  roots  of  which  likelihood 
ratio  statistic  is  a function.  We  will  be  concerned  with  cases 
when  the  number  of  parameters  increases  with  the  sample  size.  In 
Section  2.3,  we  will  use  the  results  of  the  preceding  section  to 
get  the  asymptotic  distributions  of  the  likelihood  ratio  test 
statistic  and  therefore  asymptotic  tests  of  Hq  vs.  H^.  Section  2.4 
will  contain  a proof  that  the  tests  described  in  Section  2.3.  are 
consistent. 


2.1.  Likelihood  Ratio  Test  Statistic 

In  this  section  we  will  be  finding  the  likelihood  ratio  test 
of  Hq:  Be  = aa  versus  H-j : Be  i aa  when  our  model  is 

(2.1.1.)  X = =F  + E . 


All  variables  are  defined  in  the  introduction  of  this  chapter. 

In  Chapter  1 we  derived  the  maximum  likelihood  estimators  of 
the  parameters  under  Hq.  If  we  substitute  those  estimators  into  the 
likelihood  function  (see  (1.1.24)  and  (1.1.19)),  we  have 


(2.1.2)  max  L(X,B,s,ct,z)  = (2irp  PV*  pN|Wp'  N(.\  • X . . -X  +] )' 

H„ 


where 


(2.1.3)  W = X(I-F'(FF')'1F)X'  , 


T = X(I-F'a,(aFF'a,)"1aF)X' , 


(2.1.4) 


roV 
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and  x..  is  the  ith  largest  eigenvalue  of  W~^T.  We  may  use  standard 
multivariate  regression  procedures  to  get  the  maximum  value  of  the 
likelihood  function  when  is  true: 

(2.1.5)  max  L(X,B,r,a,Z)  = (2*)"^  pNe"*'  pN(W(-^  N, 

H1 

where  W is  defined  above.  If  we  combine  (2.1.2)  and  (2.1.5)  we 
will  be  able  to  get  the  likelihood  ratio  test  statistic  of  Hq 
versus  . Our  result  is  summarized  in  the  following  theorem. 


Theorem  2.1.1.  If  our  model  is  given  by  (2.1.1)  and  we  wish 

to  test  the  hypothesis  HQ:  B :=ua  versus  H-j : B:.yaa  (a  has  full  row  rank), 

then  the  likelihood  ratio  test  statistic  is 
max  L(X,B,s,a,t) 


H 


A = 


0 


max  L(X,B, 
H1 


(X  -A  *i 
V P P-1 


’ Xp-r+l ' 


£ N 


where  x . is  the  ith  largest  eigenvalue  of  W~^T  and  W and  T are 
defined  by  (2.1.3)  and  (2.1.4)  respectively. 


Remark:  When  a is  the  zero  matrix,  the  likelihood  ratio  test 

statistic  is  identical  to  that  given  in  Theorem  2.1.1  except  that 
T is  equal  to  XX1 . 

We  also  have  the  following  corollary: 


Corollary  2.1.1.  Let  our  model  be 


xij  = Ci  + eij;  1 = l’2*---»kl  J = 1,2,...^; 

where  x • ^ is  a p-dimensional  vector  of  observed  values,  £..  is  the 
mean  of  the  ith  group  of  observations  and  e-  is  a p-dimensional 
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error  vector.  We  assume  that  the  errors  are  independently  distributed 
with  a normal  distribution  having  mean  vector  0 and  unknown  covariance 
matrix  E.  The  likelihood  ratio  test  statistic  of  the  hypothesis 
Hq-.  B£.j  = a versus  H-j : B^.  f a , where  B is  an  unknown  rxp  matrix  and 
a is  an  unknown  rxs  vector, is 


A = (A  . 'A  ^ N, 

v p p-1  p-r+r 

where  A.  is  the  ith  eigenvalue  of  W"^T  and 
k ni 

i n . 
k i 


T=  l l Uu-x)(xu-x)\ 
i=l  j=l  1J  1J 


ni  k n' 

x,  r (ni )_1  l Xjj-  x - N*1  l l x. 
1 1 j=l  1J  1=1  j=l 


N = l ni. 
1 = 1 1 


Corollary  2.1.1  follows  from  Theorem  2.1.1  just  as  Application  2 
followed  from  Theorem  1.1.2  in  Chapter  1. 

The  reason  we  mentioned  Corollary  2.1.1  is  that  the  hypothesis 
we  are  testing  in  that  corollary  is  exactly  the  hypothesis  of 
dimensionality  in  Rao  [1973].  Rao  derives  the  likelihood  ratio  test 
statistic  when  1 is  known.  He  does  not  derive  the  likelihood  ratio 
test  when  I is  unknown  however  he  does  give  an  alternative  test 
which  is  also  based  on  the  smallest  roots  of  W'^T.  He  gives  an 
asymptotic  test  based  on  his  test  statistic  which  is  valid  only 
when  k the  number  of  groups  is  fixed.  If  we  use  the  model  in 
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Corollary  2.1.1  we  may  believe  that  the  number  of  groups  should 
increase  when  the  sample  size  increases.  The  asymptotic  test  in 
this  case  would  not  be  the  same  as  when  k is  fixed  (sc-e  Section  2.3). 

Kristoff  [1973]  considered  testing  an  unspecified  linear 
relationship  in  several  models.  In  the  basic  model  (his  case  1), 
we  measure  a person's  scores  on  two  tests.  We  assume  there  is  an 
equivalent  form  of  each  test  available.  A person's  scores  are  equal 
to  that  person's  abilities  (true  scores)  plus  an  error  term.  We 
summarize  this  model  with  the  following  equation: 

xij  = y+Veij;  1 = 1 .2,..  • ,k;  j = 1 ,2;  N = 2k; 

where  x.  . is  a 2-dimensional  vector  whose  elements  are  the  ith 

' J 

person's  scores  on  the  jth  form  of  the  two  tests,  y is  the  average 
person's  true  scores  on  the  two  tests  (it  is  the  same  for  either  form 
of  the  two  tests),  ^ is  the  difference  between  the  ith  person's 
true  scores  and  the  average  person's  true  scores,  and  • is  the  error 
term.  The  error  terms  are  all  pairwise  independent.  Each  has  a 
normal  distribution  with  mean  vector  0 and  unknown  covariance 
matrix  X.  We  wish  to  test  the  hypothesis  that  a single  unspecified 
linear  relation  exists  against  the  hypothesis  that  none  exists,  i.e., 
we  test 


Hq:  Bc.j  = 0,  V-  versus  H-j : f 0,  for 


some  i , 


where  B is  an  unknown  2-dimensional  row  vector.  We  found  the  maximum 


likelihood  estimators  of  the  parameters  when  is  true  in  Application 

k U 

of  Chapter  1 under  the  assumption  that  J t,.  = 0.  We  could  also  get 

i=l  1 


r 


w 
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the  MLE's  when  is  true  using  the  usual  theory  of  multivariate 
linear  regression.  If  we  put  these  together  we  would  get  the  likeli- 
hood ratio  test  statistic  which  turns  out  to  be  a function  of  the 
smallest  eignevalue  of  W ^T;  W and  T are  given  in  Application  3. 

The  smallest  eigenvalue  of  W is  the  same  statistic  Kristoff 


recommends.  If  we  were  to  increase  our  sample  size  in  this 
example,  we  would  probably  increase  the  number  of  people  in  our 
sample  and  not  the  number  of  equivalent  forms  of  each  test,  that  is, 
we  would  assume  that  k increases  as  N does  and  that 


1 im 


N-k 

N 


1 im 

k-x° 


2k-k 

2k 


1/2. 


This  example,  therefore,  provides  us  with  a situation  in  which  the 
number  of  parameters  does  not  stay  fixed  as  the  sample  size  increases. 

In  case  2 of  Kristoff,  we  have  exactly  the  same  model  as  the 
above  with  one  minor  change  - the  difference  between  the  score  on  one 

form  of  the  two  tests  and  the  score  on  the  other  form  of  the  two 

. 

tests  need  not  have  expectation  0.  Our  model  is 

Xi j * Ci+0j+eij;  1 = ‘>2, ...,<;  j = 1,2;  : 

where  X..,  £.. , and  e.;  are  defined  as  before  and  a.  is  the  expected 

' J ‘ ' J 

true  score  on  the  jth  form  of  the  two  tests.  Again  we  will  test 
Hq:  Bc.j  = 0,  vi  versus  / 0,  for  some  i,  where  B is  an 

unknown  2-dimensional  vector.  If  we  estimate  first,  we  can  find 
the  MLE's  of  the  parameters  when  the  hypothesis  is  true.  (See 
Application  3 of  Chapter  1 for  the  type  of  argument  needed.) 
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When  the  hypothesis  is  false,  it  is  also  easy  to  get  the  MLE's. 
The  likelihood  ratio  test  statistic  is  a function  of  the  smallest 
eigenvalue  of  W~'T,  where 


k 2 


T=  l l (xirx..)(x.ri  )', 
i=l  j=l  1J  J 1J  J 

1 k 

X-J • = (1/2) (xil+xi2) , x.j  = k xij> 


i k 2 

x = (2k)  l l x... 

1=1  j=l  1J 


The  smallest  eigenvalue  of  W" ! T is  also  the  statistic  Kristoff 
recommends . 


2.2.  Asymptot ic  Distributions  of  the  Roots 


In  this  section,  we  find  the  asymptotic  distribution  of  the 
roots  needed  in  the  likelihood  ratio  tests  under  the  null  hypothesis, 
Hq:  B3  = aa.  The  roots  in  which  we  are  interested  are  the  smallest 
roots  of 


(2.2.1) 

where 

(2.2.2) 

(2.2.3) 


|t-xnw|  = 0, 

T = X(I^-Fla'(aFF'a')~'aF)X' , 
W = X(In-F'(FF')-1F)X'. 


Throughout  this  section,  certain  variables  are  subscripted  with  an 
"N"  indicating  that  those  variables  are  connected  with  a sample  of 
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size  N.  We  will  let  be  the  ith  root  of  (2.2.1)  when  our  sample 
is  of  size  N. 

It  is  helpful  to  find  the  distribution  of  the  smallest  roots 
of 

|T-W-4>nW|  = 0. 

Note  that  if  4^  is  the  ith  largest  root  of  the  above  expression, 
then  4>.^  + 1 = where  is  the  ith  largest  root  of  (2.2.1). 

All  our  theorems  are  results  in  terms  of  We  will  assume  that 

1 N 

a is  of  full  row  rank. 

In  this  section,  we  discuss  cases  when  the  number  of  parameters 
increases  with  the  sample  size.  We  already  mentioned  that  the 
models  of  Kristoff  [1973]  provide  us  with  examples  where  it  is 
reasonable  to  assume  that  the  number  of  parameters  increases  with 
the  sample  size.  A measure  of  how  fast  the  number  of  parameters 
increases  will  be 


1 im 

N-xo 


N-_k 

N 


1 - 1 i m 
N 


l-(l-t) 


t. 


There  are  three  possible  cases: 

Case  1:  k is  fixed; 

Case  2:  t f 1 ; 

Case  3:  k goes  to  infinity  as  N does;  t = 1. 


We  will  always  assume  that  r (the  number  of  rows  of  B),  p (the 
number  of  rows  of  X)  and  s (the  row  rank  of  a)  are  fixed. 

When  k is  fixed,  the  asymptotic  distribution  of  the  r smallest 
roots  (from  Anderson  [1951b]  and  from  Hsu  [1941])  is  the  following: 


* 
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Theorem  2.2.1.  Let  p.^  = N-4^;  i = p-r+l,...,p;  where  ^ is  the 
ith  largest  root  of  |T-W-<j>^W|=  0.  Then  the  limiting  distribution 

0f  (pp-r+l,N’pp-r+2,N”‘*’ppN)  when  k is  fixed  is 

P P /2 

2~i  r(k-s-  p+r)^^(r)  P pf( k-s-p-1  )e  i=p-r+l  lN 
i=p-r+l  lN _ 

ii  r(i(k-s-p+r-l-i))r(i(  r+l-i)) 
i = l d z 


P 

n 

i=p-r+l 


P 

n 


j=i+1 


(piN_pjN^  • 


The  above  distribution  is  the  same  as  the  joint  distribution 
of  p.j  where  is  the  ith  largest  root  of 

|J  - pi i = 0, 

and  J is  defined  by 

k-s-p+r 

j = y u.u' 
i=l  ' 1 

where  the  are  independently  distributed  with  a normal  distribution 

with  mean  0 and  covariance  matrix  I . 

r 


Remark.  Theorem  2.2.1  can  also  be  used  when  a is  the  zero  matrix  by 
letting  s = 0. 

We  now  derive  the  asymptotic  distribution  of  the  roots  in 
Case  2 and  Case  3.  The  asymptotic  distribution  of  the  smallest 
roots  in  these  cases  is  markedly  different  than  the  distribution  of 
the  roots  given  in  Theorem  2.2.1.  Before  we  state  and  prove  several 
theorems  which  give  the  asymptotic  distribution  of  the  roots  in 
Cases  2 and  3,  we  need  to  derive  several  lemmas. 


X = iF  + E, 

B^F  = aa , 

where  X,  =,  F,  E,  B,  a,  and  a are  defined  in  the  introduction 
to  this  chapter.  The  roots  of 

|t-w-*nw|  = 0, 

where  T,W  are  given  by  (2.2.2)  and  (2.2.3),  have  the  same 
distribution  as  the  roots  of 

(2.2.4)  l(N-k)'1  U*U*'  + N^(N-k) -1C-4»N(N-k)“^  Z+D-,  | = 0, 


wHptp 
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and  is  the  ith  largest  eigenvalue  of 

N'1(-F(I-F'a‘(aFF‘a,)_,aF)F,E)v'1  , 

and  finally  U*,  V are  pxk-s  and  pxN-k  respectively  whose  columns 
have  independent  normal  distributions  with  mean  vector  0 and 
covariance  matrix  Ip. 

Proof.  For  any  invertible  pxp  matrix  0 we  know  that 

| 0 ( T -W ) G ' -<p0W0  ' | = | T -W  - 4>W  | = 0; 

the  roots  $ are  the  same  whether  we  observe  X or  OX.  Since 
we  may  pick  o so  that  eie1  = Ip,  we  may  assume,  without  loss  of 
generality,  that  the  columns  of  X have  p-variate  normal  distributions 
with  mean  vectors  equal  to  the  respective  columns  of  eaF  and  with 
common  covariance  matrix  Ip. 

Next  we  will  let  be  column  orthogonal  matrices  such 

that 

V1  V’  = F 1 a ’ (aFF'a1 )_1aF, 

V2  = I-F’(FF' )_1F, 

V3  V’  = F * ( FF * )_1F-F 'a 1 (aFF'a ' )_1aF. 

It  is  easy  to  see  that  such  matrices  exist.  Let 

Y = (YrY2,Y3)  = (XV1,XV2,XV3)  = X(VrV2,V3), 

where  Y^  is  pxs , Y2  is  pxk-s,  and  Y is  px.N-k.  Since  (V-|,V2,V3) 
is  an  orthogonal  matrix,  each  column  of  Y has  an  independent  normal 
distribution  with  covariance  matrix  Ip. 
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The  roots  for  which  we  are  getting  the  asymptotic  distribution 
are  functions  of  Y?  and  Y 3 . Since  Y-j  is  independent  of  (Y^Y^),  we  may 
eliminate  it  from  our  considerations.  The  distribution  of  Y^,  Y^  is 
constant  - exp  - ^ [tr ( Yg-0 =FV2) ( Y2-©=FV2) 1 + (Y3-6hFV3) (Y3-0sFV3) 1 ] = 

constant  - exp  - ~[tr  Y^Y^  - (Y3~0=FV3) ( Y3-esFV3)  ’]. 

We  want  the  distribution  of  the  roots  (^i ^ »^2N * * - * °f 

|T-w-<!>Nw|  = |y3y^ny2y^|. 

We  know  that  Y9Y2  has  a central  Wishart  distribution  and  that  Y3Y3 
has  a noncentral  Wishart  distribution. 

We  now  let 

(2.2.5)  G = 0:tFV3. 

We  can  write  the  noncentrality  parameter  in  the  distribution  of 
Y3Y3  in  terms  of  GG ’ : 

(2.2.6)  GG'  = o -FV 3V^- ' 0 ' = G F(F ' (FF ' ) _1 F-F' a ' (aFF ' a ' )-1aF )F ' e ' 6 ' , 

= OF(I-F'a'(af:F'a')"1aF)Fl:,el. 

Let  (yin»Y2n» • • • >YpM)  be  the  ordered  eigenvalues  of  N_1GG'.  Under  the 
hypothesis  that  B-  = aa,  the  r smallest  eigenvalues  of  N 'GG'  will 
equal  zero. 

Next,  we  transform  Y9  and  in  such  a way  that  only  a few 
elements  of  the  resulting  matri  - are  dependent  on  the  non  zero 
eigenvalues  of  N ^ GG ' . Consider 

U = I'-|Y3r2,  and  V = r,  Y2, 


52 


where  r]  and  r2  are  orthogonal  matrices  which  make 


/y 


IN 


y2N  •• 


0 

0 


/Yp-r,N° 


0 

0 / 


0 


Slice  r1  and  are  both  orthogonal  matrices,  the  distribution  of 
U and  V is 


7 p-r  p-r 

(2.2.7)  constant-exp-  4-(t-(UU ' +VV ' )+  V vto-y...  u..+N  T y.M), 

2 i=!  lN  11  i=i  ™ 


where  u,.^  is  the  i,ith  element  of  U. 

We  want  the  distribution  of  the  roots  of 


(2.2.8) 


I uu ' -*NVV ' ] * i T|  YjY^r , ' -fyr,  VjYjr  j I * i ',3Y3'‘tN,2v2  | =0 . 


V/e  should  mention  that  U is  pxk-s  and  that  V is  pxN-k. 

Finally,  we  will  make  several  substitutions  which  will  give  us 
our  lemma.  Let 

U*  = U-r1Gr2. 

The  joint  distribution  of  U*  and  V is 
(2.2.9)  constant -exp[-  ~ tr(U*U*'+VV' )], 

i.e.,  the  columns  of  U*,  which  is  pxk-s,  and  the  columns  of  V, 
which  is  pxN-k,  are  independently  distributed  with  covariance  matrix 
Ip  and  mean  vector  0.  We  therefore  have 
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n 


(2.2.10)  UU'  = U*U* ‘ + C + N(D1+<*N- 1 ) , 

where  C and  are  given  in  the  statement  of  this  lemma.  Our  final 
substitution  is 

(2.2.11)  Z = (N-k)“*(VV'-(N-k)I  ), 

= (N-k)”^(VV' )-(N-k)^  I 

The  lemma  now  follows  through  a substitution  of  (2.2.10)  and  (2.2.11) 
into  (2.2.8).  Q.E.D. 

i 

Lemma  2:  If  P is  a pxk  matrix  and  each  column  of  P has  a normal 
distribution  with  mean  0 and  covariance  matrix  I ^ , then  each 

-A 

element  of  the  matrix  (k) "2(PP ' -klp)  asymptotically  has  a normal 
distribution  with  mean  0,  variance  1 for  diagonal  elements  and 
variance  2 for  off-diagonal  elements.  All  elements  are  asymptotically 
independent.  We  will  call  this  asymptotic  distribution  the 
p-dimensional  matri x normal  distribution. 

Proof.  Use  Theorem  4.2.4  in  Anderson  [1958]. 

Remark.  If  we  let  (PP1)^  be  the  rxr  matrix  which  comprises  the 
lower  right  hand  corner  of  PP * , then 

k‘*((PP')22-kIr) 

has  an  r dimensional  matrix  normal  distribution. 

We  now  state  several  assumptions  which  we  will  make  when  we 
discuss  the  asymptotic  distribution  of  the  roots  for  Cases  2 and  3. 
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Assumption  1.  The  matrix  N-1GG'  defined  by  (2.2.6)  converges  to  a 
finite  matrix  R which  has  rank  p-r. 

Assumption  2.  y^  = y^  + o(»ft),  where  y.  is  the  ith  largest 

_ 1 

eigenvalue  of  R and  y^  is  the  ith  largest  eigenvalue  of  N"‘GG'. 

Assumption  3.  The  non  zero  roots  of  R have  multiplicity  one. 

Assumption  3 is  not  necessary;  the  proof  given  below  would 
have  to  be  altered  to  apply  when  the  non  zero  roots  of  R do  not 
nave  multiplicity  one.  Since  the  alterations  only  complicate  matters, 
and  since  they  do  not  affect  the  distribution  of  the  smallest  roots, 
we  will  omit  them. 

Part  1 . Case  2:  lim  = t f 1 . 

N-KO 

In  the  following  theorem,  we  give  the  asymptotic  distribution 
N-k 

of  the  roots  when  lim  — = t f 1 . 

N-*=° 

N-k 

Theorem  2.2.2.  Assume  that  lim  ~rj—  = t < 1,  and  that  Assumptions 

N-~» 

1 , 2 and  3 hold.  Let 

Vr*i,N”  <N-k>S-r+ur<k/N-k»;  1 * U2,-...r; 

where  is  the  ith  largest  root  of  TW  1 - 1 p . The  limiting  distribu- 
tion of  ( vp-r+l,N’  Jp-r+2  Nr  ' ' ’ vprP  the  Sanie  3S  the  distribution 
of  the  r roots  from 

|(l/t-1)*  Qr(l/t-l)Q2-vIr|  = 0, 

where  Q-|  and  have  the  r-dimensional  matrix  normal  distribution 
(see  Lemma  2). 
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Proof.  By  Lemma  1,  we  only  have  to  consider  the  distribution  of  the 
r smallest  roots  ($iN;  i = p-r+1 ,p-r+2, . . . ,p)  of 

(2.2.12)  !(N-k)"1U*U*'-N^(N-k)_1C-<tN(N-k)^Z+D1|  = 0, 

where  C,  Z,  and  are  defined  in  Lemma  1,  and  the  columns  of  U*,  V 
have  independent  normal  distributions  with  mean  vector  0 and 
covariance  matrix  I . 

Consider  the  following  matrix: 

A = k"^(U*U*'-kIp). 

If  we  substitute  A into  (2.2.12),  we  get  the  following  equation: 

(2.2.13)  | ( k ) * (N-k)‘1A+N^(N-k)'1C-<f»N(N-k)"2Z+D1+k(N-k)"1Ip|  = 0. 

By  Lemma  2,  A and  Z have  p-dimensional  matrix  normal  distribu- 
tions. It  is  easy  to  see  that  C is  asymptotically  independent  of 
A and  Z.  The  elements  of  C are  functions  of  the  first  p columns  of 
U*.  Since  A is  the  same  asymptotically  if  we  delete  the  first  p 
columns  of  U*,  C and  A can  be  thought  of  as  functions  of  different 
variables  asymptotically.  The  asymptotic  distribution  of  C can  be 
obtained  by  using  the  definition  of  C. 

Consider  the  following  variable: 

(k/N-k)+vp_r+i/v'FTk. 

We  can  substitute  the  above  expression  into  (2.2.13)  for  ^ and 
try  to  find  out  what  v +i-  must  be  distributed  as  when  N -*■  >*.  Our 
result  is 

(2.2.14)  |k2(N-k)'1A+N^(N-k)-1C+(k(N-k)"1+(N-k)"^vp_r+.)Z+D2|=0, 


0 


where 
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p-r+i 


N vp-r+i 

N-kY2N-  ^ 


P-.'.J,1.  . i ' 

✓JTk  r 


Let  us  write 


a=(a]1  a12)’ 


71  22 


2\ 

(#21  0 >' 


(zn  zi 2 x 

v Z ' * 

/.21  Ln 


where  A-,-j,  are  all  p-rxp-r;  A-,^,  j^2  and  Z^2  are  all 

p-rxr,  etc.  We  now  discuss  the  upper  left  hand  block  of  p-rxp-r 
elements  inside  (2.2.14).  When  N is  large,  k2(N-k)~^A.| -j  ,N2(N-k)”l 

, , _A 

and  (N-k)  2Z,-j  all  are  arbitrarily  small.  The  only  matrix  which 
remains  is 


n n 

N-k  y1N  0 

n N 

N-k  y2N 


N-k  p-r,N 


1/t  Y-,  0 ...  0 

0 1/t  y2  .. . 0 


0 0 . . . 1/t  Yr 


The  elements  in  the  last  r rows  and  columns  all  go  to  0 when  N 

1 

gets  large.  If  we  multiply  the  last  r rows  and  columns  by  (N-k)4, 
we  will  be  able  to  find  the  terms  that  dominate.  First  of  all, 

(N-k)4[kv(N-k)  *A12]  ->  0,(N-k)4[(N);i(N-k)  1 &]2]  ->  0 and 


(N-k)4[(N-k)‘2-Z12]  ->  0. 


0 


Almost  sure  convergence  is  indicated  by 


57 


L: 


© 


0 


When  we  multiply  the  r rows  and  columns  by  (N-k)4,  we  multiply  the 
lower  right-hand  corner  by  (N-k)-'.  By  Lemma  2 we  know  that 


■§  ^ A?J,  ± (1/t-l  )*  Q1 , 


22 


L . 


where  -*■  indicates  convergence  in  distribution  and  Q1  has  a 
r-dimensional  matrix  normal  distribution.  Similarly 


N^lc  ^22  ^ H/t-1  )Q£, 


where  has  a r-dimensional  matrix  normal  distribution.  All  other 
terms  go  to  zero. 

If  we  combine  the  above  statements,  we  get  that  when  N is  large 
(2.2.14)  becomes 


0, 


where 


V ( l/t-l )a 

Therefore  vp_r+^  is  a root  of  jQ^i  = 0.  The  distribution  of 
^vp-r+1’  vp-r+2”"’  vp^  is  the  distribution  of  the  roots  of 

i (1/t-l  Qr(l/t-1)Q2-vIr|  = 0, 


where  Q-j  and  have  r-dimensional  matrix  normal  distributions. 

Consider 
k 


*iN  = iTk  + viN/’/^  ’ 1 = P-r+1>P-r+2....,P; 


i 


I ! 

i 


J 
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r 


where  viN  is  picked  so  that  we  have  equality  in  the  above  expression, 
i.e.,  viN  is  defined  by 

viN  ' + iN-iPk>  •”*  f1N(A,Z,C). 

We  now  show  that  the  distribution  of  viN  goes  to  the  distribution 

of  v.. . 

Our  preceding  discussion  shows  that  when 

lim  A22  = Q] * Tim  = ^2’ 

N-ko  N-*0 , 

and  A,  Z and  C all  converge  to  finite  matrices, 
lim  »if|  = = f,(Q,,Q2). 

We  have  mentioned  the  limiting  distributions  of  A,  Z,  C.  The  set 
of  discontinuities  of 

(vp.r+l  ’vp-r+2’-  • ’ ,vp'  = l'fp-r+l'VV',,,-,VQriy  )• 

occur  only  when  one  or  more  of  the  roots  (vp_r+i ,vp-r+2’ • ' ' ,vp)  are 
equal;  the  set  of  discontinuities  has  measure  0 since  the  probability 
any  of  the  roots  are  equal  is  zero.  By  applying  Rubin's  theorem 
(see  Anderson  [1951b]),  we  have  that  the  asymptotic  distribution  of 

(vp-r+l,N,vp-r+2,IT‘  * ‘ ,vpN^  1S  the  same  as  the  distribut'ion  °f 
^vp-r+l ,vp-r+2 ’ " ‘ ,vp) ' Q.E.D. 

Using  Theorem  2 in  Anderson  [1951b]  and  the  above  theorem, 

Theorem  2.2.2,  we  conclude  the  following: 
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N-k 

Theorem  2.2.3.  Assume  that  Tim  = t < 1 and  that  Assumptions 

N-o  N 

1 , 2 and  3 hold.  Let 


pfN  * uiN-k/N-ki 


; i = p-r+1 ,p-r+2 , . . . ,p; 


where  <j>.^  is  the  largest  root  of  TW"  -I  . The  limiting  distribution 


of  (p 


p-r+1, N,Mp-r+2,N’  * * ’ pN 


> • • »pnM^  1S 


2'r/2[  n (ri(r+l-i ))]""' e 
i=l  c 


S 


n n (p  -p  ). 

i=p-r+l  j=i+1  in  jN' 


Part  2:  Case  3:  k ■+  00  as  N -+  » but  lim  = 1 . 

N->® 

The  following  theorem  will  contain  the  asymptotic  distribution 
of  the  roots  when  k goes  to  infinity  as  N does  and  lim(N-k)/N  = 1. 

N-k» 

N-k 

Theorem  2.2.4.  Assume  that  k -*■  « as  N -*■  «,  that  lim  — n—  = 1 and 

1,1 

that  Assumptions  1,  2 and  3 hold.  Let 

viN  = ^iN  ' 1 = p_r+1’  P-^+2 , . . . ,p ; 

where  ^ is  the  ith  largest  root  of  TW  ^-1  . Then  the  limiting 

distribution  of  (vp  -r+Vrvp-r+2,N”--’vpN)  is  the  samc  as  the 
distribution  of  the  r roots  of 

IQ-V  Irl  = 0 

where  Q has  an  r-dimensional  matrix  normal  distribution  (see 
Lemma  2). 

Proof.  Because  the  p^o of  of  this  theorem  is  vr.y  similar  to  the 
proof  of  Theorem  2.2.2,  we  w;ll  only  give  an  outline  of  the  proof. 
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By  Lemma  1,  we  only  have  to  consider  the  distribution  of  the 
r smallest  roots  (<J> -N ; i = p-r+1 , p-r+2,...,p)  of 

(2.2.15)  | (N-k)“1U*U*,+N»(N-k)-1  C-<|>N(N-k)"«Z+D1 1 =0, 

where  C,  Z and  are  defined  in  Lemma  1 and  the  columns  of 
U*,  V have  independent  normal  distribution  with  mean  vector  0 and 
covariance  matrix  Ip. 

Consider  the  following  matrix: 

A = k“2(u*u*'-klp). 

Substituting  A into  (2.2.15)  yields 


(2.2.16)  | /IT/N-k  A + /FT/ N-k  C-$.,  — +D,  + ~ I.  ! = r- 


1 N-k  V 


We  now  consider  the  following  variable, 


k/N-k  + vp_r+./Nk"^. 


Substituting  the  above  into  (2.2.16),  we  obtain 

(2.2.17)  lR^A  + ijfc  - (fjV  + D2|  = 0, 


Nk 


where 


°2  = 


N p-r+i 

N-k^N-  Nk-i 

0 

6 


N 

N-k  y2N 


p-r+i 

Nk”'® 


0 

0 

^p-r+i 


Nk 


-i  v 


If  we  multiply  the  last  r rows  and  columns  of  the  matrix  inside  the 

x 

determinant  of  (2.2.17)  by  N:j  k " 4 , and  let  N go  to  infinity,  we  get 
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= 0, 


where 


^ ^l~vp-r+i 


■'r  ' i!"  ^22"vp-r+i 


Q1  lim  A22, 

N-KX> 

and  A22  is  the  lower  rxr  right-hand  corner  of  A.  Therefore, 
Vp  r+.  is  a root  of  |Q|  = 0.  By  Lemma  2,  the  distribution  of 
^vp-r+l ,vp-r+2’ ' " ,vp^  1S  the  distribution  °f  the  roots  of 

|Qrvir!  = 0 


where  Q-j  has  the  r dimensional  matrix  normal  distribution. 

All  that  we  have  to  show  is  that  the  vp_r+-j^.  which  gives  us 
equality  in 


^p-r+iN 


k , Vr+i,N 
N'k  Nk"^ 


qoes  in  law  to  v ...  The  demonstration  of  this  fact  for  Case  3 
p-r+i 

is  the  same  as  for  Case  2.  We  therefore  have  our  theorem.  Q.E.D. 


If  we  use  Theorem  2 in  Anderson  [1951b]  and  Theorem  2.2.5,  we 
can  conclude  the  following: 

N-k 

Theorem  2.2.5.  Assume  that  lim  = 1,  that  k •+  °°  as  N -+  ”,  and 

N-KO 

that  Assumptions  1,  2,  3 hold.  Let 

viN  = (^iN-k/N-k)Nk-;  i = p-r+1,  p-r+2,...,p; 
where  is  the  ith  largest  root  of  TW  ^-1  . 
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Then  the  limiting  distribution  of  the  set  i = p-r+l,...,p) 


is 


-,L 


2"r/2[  n r(i(r+l-i))]-1e  1_p~r+1  S n (v. 

i=l  i=p-r+l  j = i+l 


N"vjN' * 


Remark:  Theorems  2.2.2  through  2.2.5  are  valid  when  a is  the  zero 

matrix. 


2.3.  Asymptotic  Tests  of  B;  = aa  Versus  B=  f aa 

In  this  section  we  use  the  asymptotic  distribution  given  in 
the  previous  section  to  get  asymptotic  tests  based  on  the  likelihood 
ratio  test  statistic.  It  should  be  recalled  from  Theorem  2.1.1  that 
the  likelihood  ratio  test  statistic  is  given  by 


A 


P 

n 

i-p-r+1 


( 1 / X i N ) ^ 


= n (l/l+<j>,N)2 

i=p-r+l 


N 


where  is  the  ith  largest  eigenvalue  of  TW""'  and  ^ is  the  ith 
largest  eigenvalue  of  TW”''-I  . 

First,  let  us  consider  the  case  when  k is  a fixed  quantity: 


Theorem  2.3.1.  (Anderson  [1951a])  If  our  model  is  given  by 
(2.0.1)  and  we  wish  to  test  the  hypothesis  that  H^:  B1:  = aa  versus 
B=  f aa,  then  the  asymptotic  null  distribution  of 

y = -2  log  A 

is  a x distribution  with  r(k-s-(p-r))  degrees  of  freedom.  The 
a-level  asymptotic  test  of  H^:  B = = aa  versus  H-j  = B?  j aa  would 

be  to  reject  the  hypothesis  HQ  when 


r 


0 
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v i x20-«) 

r(k-s-p+r) 

2 2 
where  x^B)  is  the  6th  tractile  of  a x distribution  with  d 

degrees  of  freedom. 

Remark:  Theorem  2.3.1  holds  when  a is  the  zero  matrix  if  we  let 

s = 0. 

Let  us  assume  we  are  actually  in  Case  2,  i.e., 

lim  (N-k)/N  = t < 1,  and  we  (mistakenly)  try  to  use  the  test  given 
N-x» 

in  Theorem  2.3.1.  We  now  examine  what  happens  to  Y under  Hq  when  N 
is  large.  Note  that 

y = -2  log  A = -2  log  n 0/l+4>.-J*  N 

i=p-r+l 

= N f log(l+$,N). 

i=p-r+l 

Using  Theorem  2.2.2  we  can  show  that  l+4>.^,  goes  almost  surely  to 
1/t  for  i = p-r+1 , p-r+2,...,p.  We  therefore  know  that 

jr  log(1+<j>.w)  goes  almost  surely  to  r log(l/t).  Since 
i=p-r+l  1N 

N*r(log  1/t)  goes  to  positive  infinity,  we  conclude  that  when  Hq  is 
true,  'V  gets  arbitrarily  large  in  this  case  as  N goes  to  infinity. 
If  we  were  to  apply  the  test  given  in  Theorem  2.3.1  for  Case  2, 
our  probability  of  rejecting  Hq,  even  when  it  is  true,  approaches  1 
as  N approaches  infinity.  For  Case  3 we  have  a similar  result.  We 
su;  "arize  the  preceding  statements  in  the  following  theorem: 

Tli'  ;rem  2.3.2.  If  our  model  is  given  by  (2.0.1)  and  we  wish  to 
test  Bh  = aa  versus  B:  / aa  under  the  assumption  that  k goes  to 
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infinity  as  N does,  then  Y = -2  log  A where  A is  the  likelihood 
ratio  test  statistic  goes  almost  surely  to  positive  infinity.  The 
test  given  in  Theorem  2.3.1  is  meaningless  in  this  case;  when  Hq 
holds,  we  would  reject  Hq  almost  surely  in  large  samples. 

Since  Y does  not  have  an  asymptotic  chi  square  distribution  in 
either  Case  2 or  Case  3,  we  have  to  derive  separate  asymptotic  tests 
for  Case  2 and  for  Case  3. 

Assume  that  we  are  in  Case  2.  For  this  case,  we  have  the 
following  theorem: 

Theorem  2.3.3.  If  our  model  is  given  by  (2.0.1)  and  we  wish  to 
test  the  hypothesis  HQ:  B~  = aa  versus  H, : B = ? aa  when 

lim  (N-k)/N  = t < 1 then  the  asymptotic  null  distribution  of 

N-xxj 

((N)(N-k)H2rk)"^((^)rA2/N  - 1), 

where  A is  the  likelihood  ratio  test  statistic,  is  a normal 
distribution  with  mean  0 and  variance  1.  The  asymptotic  test  of 
Hq:  B"  = aa  versus  H-j  : B:-  ? aa  would  be  to  reject  HQ  when 

(N(N-k#(2kr)-i((^)rA2/N-l)  >2^, 

and  do  not  reject  otherwise,  where  Zp  is  the  B fractile  of  a 
standard  normal  distribution. 


o 


Proof : Consider  the  following  sequence  of  statements: 

2/11  p , P 

A = n (1/a  ) = n (1/1+*.J, 

i=p-r+l  ir<  i-p-r+1  m 


= n (l/( l+k/(N-k)+v.w/(N-k)2) ) , 

i=p-r+l  1IN 

= n (^)(l/(l+(N-k)VV.N)), 

i=p-r+l  n 


= n (-f,--)(l-(N-k)^N'1v  .N  + 0(N) ) , 

i=p-  r+1  N 1IN 


- (¥)r(’-(»-k)4N-'  l viN  + 


i=p-r+l 


0 ( N ) ) . 


The  above  equality  can  be  written: 
((-N-\)r(A  2/N) -UN/Ch'-kf  = - l 

i = p- r+1 


viN  + om. 


The  asymptotic  distributio  of  ^ v-w  can  be  easily  obtained 

i=p-r+l  r 

from  Theorem  2.2.2.  The  limiting  distribution  of  \ v...  is  the 

i=p-r+l  1 

distribution  of 


l v.  = -tr((|  -1)  Q,-(l  -1)Q  ). 
i=p-r+l  1 1 


The  diagonal  elements  of  Q-j  and  are  all  independent,  each  with  a 

normal  distribution  with  mean  0 and  variance  2.  Since  the  trace  of  a 

r 

matrix  is  the  sum  of  the  diagonal  elements,  we  know  that  \ v. 

i i i-p-r+1  1 

is  normally  distributed  with  mean  0 and  variance  2r(-r)(-r  -1).  We 
therefore  conclude  that 
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[(N-k)/(2rNkp]  ? v.N 
i=p-r+l  1IN 

has  an  asymptotic  normal  distribution  with  mean  0 and  variance  1. 
Finally  we  may  state  that 

(N(N-k))^(2rk)-f'((^NT)rA2/N-l) 

has  an  asymptotic  normal  distribution  with  mean  0 and  variance  1. 

Q.E.D. 


We  now  talk  about  the  case  where  k -*■  » as  N + »,  but 
1 im  (N-k)/N  =1.  In  this  case  we  have  the  following  theorem: 

N-wo 


Theorem  2,3.4.  If  our  model  is  given  by  (2.0.1)  and  we  wish  to  test 

the  hypothesis  H„:  Be  = aa  versus  H-| : Be  f aa  when  lim(N-k)/N  = t = 1 

N-><» 

and  k ->  <■»  as  N -*■  then  the  asymptotic  null  distribution  of 


N(2rk) 


iL) 

' 'N-k 


rA2/N-l) 


is  a normal  distribution  with  mean  0 and  variance  1.  The 
asymptotic  test  of  Hq : B:  = aa  versus  : B.  j-  aa  would  be  to 
reject  Hq  when 

N(2rk)'2((N/N-k)rA2/N  -1)  > Z.  , 

I “ ct 

and  do  not  reject  otherwise,  where  Z„  is  the  8 fractile  of  a 

P 

standard  normal  distribution. 
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Proof : Consider  the  following  sequence  of  equations: 

2/N  P P 

A = n (1/A.J  = n (l/l+a.J 
i=p-r+l  i=p-r+l 


= n (l/(l+(k/N-k)+v.N/Nk'i)) 
i=p-r+l 


= n (%(l/(l+(N-k)kV2v4 

i=p-r+l 


iN 


= (-^)r  5 (1-kV’v . +0(NZk"1)) 

” i=p-r+l  iri 

= (nr^l-kV1  f viN)  + 0(N2k_1) 


i=p-r+l 


The  above  equality  may  be  written 

Nk'*((A)r(A2/N-1)  = ? viN  + 0(Nk-^). 

i=p-r+l  irt 

P 

The  asymptotic  null  distribution  of  } v.N  can  be  obtained 

i=p-r+l  p 

using  Theorem  2.2.4.  The  limiting  distribution  of  } v.w  is  the 

i=p-r+1  1 

distribution  of  the  trace  of  Q which  has  a normal  distribution  with 
mean  0 and  variance  2r.  The  theorem  now  .follows.  Q.E.D. 


Remark:  When  a is  the  zero  matrix,  Theorems  2.2.2  - 2.2.4  are  all 

still  valid. 
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2.4.  Consistency  of  the  Tests 

In  this  section,  we  discuss  the  consistency  of  the  tests 
from  the  preceding  section,  i.e.,  we  show  that  the  power  of  the 
testsgoes  to  one  as  the  sample  size  increases  when  a fixed 
alternative  is  assumed  to  be  true.  We  will  use  the  following 
theorem  to  show  the  consistency  of  the  tests. 

Theorem  2.4.1.  Assume  that  an  u-level  asymptotic  test  rejects 
when  a test  statistic  is  greater  than  a constant  and  does  not 

reject  otherwise.  If  the  test  statistic  goes  to  infinity  almost 

surely  as  N does  for  a fixed  alternative,  the  test  is  consistent. 

Proof.  This  theorem  follows  from  the  definition  of  a consistent 
test.  Q.t.D. 

In  the  next  theorem,  we  discuss  the  consistency  of  the  test 
given  in  Theorem  2.3.1. 

Theorem  2.4.2.  If  N~^ F( I^-F 'a ’ (aFF1 a ' ) ’ ’ a F ) F ' goes  to  a finite 
matrix  of  full  rank,  the  test  given  in  Theorem  2.3.1  is  consistent 

Proof.  For  our  fixed  alternative,  let  us  consider  = = where  Hq 

is  a matrix  whose  row  rank  is  greater  than  p-r.  Since  we  assumed 

that  N"V(I^-F'a 1 (aFF'a' ) ”"* aF) F 1 goes  to  a finite  matrix  of  full 
rank,  the  matrix 

N"1 sqF( I-F' a ' (aFF1 a ' )”^aF)F' =g 

goes  to  a matrix  RQ  of  rank  greater  than  p-r.  We  can  show  (see 
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^ the  proof  of  Theorem  1.2.1)  that  W-1T  converges  almost  surely  to 

l/t(Ip+E  ]R0),  and  that  the  rth  smallest  eigenvalue  of  W'^T  goes 
almost  surely  to  the  rth  smallest  eigenvalue  of  l/t(  I +z”"*Rq)  . 

In  this  case,  t = 1.  Since  Rg  is  of  rank  greater  than  p-r,  the 
rth  smallest  eigenvalue  of  W~W  goes  almost  surely  to  a number 
greater  than  0.  We  therefore  have  that 


V Vl’’"’ Vr+1 

goes  almost  surely  to  a number  greater  than  one.  We  can  now 
state  that 

-2  log  A = -2  log(Xp-Ap_r....Xp_r+1)^  N 

goes  almost  surely  to  positive  infinity.  The  theorem  follows 
through  an  application  of  Theorem  2.4.1.  Q.E.D. 

For  Case  2 and  Case  3,  we  have  to  change  what  our  fixed  alterna- 
tive is.  In  these  cases,  the  number  of  parameters  is  assumed  to 
increase  with  the  sample  size.  It  is  fairly  evident  that  the 
fixed  alternative  we  picked  when  k is  fixed  makes  no  sense  for 
Case  2 or  Case  3. 

We  now  describe  what  our  fixed  alternative  will  be.  For  each 
N,  let  us  pick  = = Cg^  so  that  the  rth  smallest  eigenvalue  of 

(2.4.1)  N"1  qnF ( I -F ' a ' (aFF’a 1 ) -1aF)F' rgN 

is  fixed  at  Yq  > 0.  Vie  are  fixing  the  rth  smallest  eigenvalue  of 
the  noncentral i ty  parameter  of  T.  Let  us  also  pick  2 = so  that 
the  matrix  given  by  (2.4.1)  converges  to  a finite  matrix. 


i 

1 


Y 


We  now  consider  the  following  theorem  which  is  concerned 
with  the  asymptotic  test  for  Case  2. 


Theorem  2.4.3.  The  asymptotic  test  given  in  Theorem  2.3.3 
(Case  2)  is  consistent. 


Proof.  We  can  show  (see  the  proof  of  Theorem  1.2.1)  that  W'^T 
converges  almost  surely  to  l/t(Ip+z~^R)  and  that  the  rth 
smallest  eigenvalue  of  W_1T  goes  almos*  surely  to  the  rth  smallest 
eigenvalue  of  1 / t ( Ip+r-1R) . For  our  fixed  alternative  (see 
paragraph  preceding  this  theorem),  R = Rq  and  the  rth  smallest 
eigenvalue  of  1 / 1 ( Ip+5:-1R0)  is  greater  than  1/t.  We  know  that 


a-2/N 


X 


P 


p-r+1 


goes  almost  surely  to  a quantity  greater  than  ((N-k)/M)r. 
Therefore,  since  (N(N~k)/k):  goes  to  infinity  as  N does, 

(N(N-k)/(2rk))*((A)rA2/N-l) 


goes  almost  surely  to  positive  infinity.  The  theorem  follows 
after  we  apply  Theorem  2.4.1.  Q.E.D. 


For  Case  3,  we 

Theorem  2.4.4.  The 
is  consistent. 

Proof.  We  omit  the 


have  a similar  result: 

asymptotic  test  given  in  Theorem  2.3.5  (Case  3) 
proof  since  it  is  almost  identical  to  the  proof 


of  Theorem  2.4.3. 
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1 


CHAPTER  3 

ESTIMATION  OF  UNKNOWN  LINEAR  RESTRICTIONS  ON 
THE  PARAMETERS  OF  A GENERAL  LINEAR  MODEL 

3 . 0 Introduction 

In  this  chapter,  we, discuss  a very  general  linear  model  called 
the  Potthoff-Roy  model.  This  model  can  be  formulated  in  the  following 
matrix  equation: 

(3.0.1)  X = F1  F2  + E, 

where  X is  a cxN  matrix  of  observed  values,  F-j  is  a known  exp 
(c  >_  p)  matrix,  is  an  unknown  pxm  matrix,  F^  is  a known  mxN  (N>m) 
matrix,  and  E is  a cxN  matrix  of  errors.  The  columns  of  E are  indepen- 
dent with  the  sane  normal  distribution  having  mean  vector  0 and  covari- 
ance matrix  Z.  We  require  that  F-j  and  F^  are  of  full  column  rank  and 
full  row  rank  respectively. 

The  classical  multivariate  linear  regression  model  can  be  seen 
to  be  a special  case  of  the  Potthoff-Roy  model  by  letting  F-j  - Ic. 

If  we  let  F^  = (1,1,..., 1)  then  the  Potthoff-Roy  model  reduces  to  a 
simple  "growth  curves"  model  (Gleser  and  Olkin  [1 9G4 ] ) . Estimation 
of  the  parameters  in  the  Potthoff-Roy  model  under  various  hypotheses 
has  been  discussed  by  Potthoff-Roy  [1964],  Rao  [1965],  and  Gleser 
and  Olkin  [1969]. 


L 


-A  l. 
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We  want  to  find  the  MIL  of  , and  of  two  other  matrices  U-j 
and  a which  satisfy 

(3.0.2)  U]  = P3  = ab, 

where  is  an  unknown  rxp  (r  < p)  matrix,  F3  is  a known  mxk  (m  k) 
matrix,  a is  a unknown  rxs(s^r)  matrix,  and  b is  a known  sxk  matrix. 
Throughout  this  chapter,  we  assume  that 

(3.0.3)  l = o2.Ic, 

p 

where  a is  an  unknown  constant. 

In  Section  3.1,  we  reduce  our  model  (3.0.1)  and  hypothesis 
(3.0.2)  to  a canonical  form.  Section  3.2  contains  a derivation 
of  the  MLE's  for  the  reduced  model,  and  also  gives  the  MLE's  for 
the  general  model.  Section  3.3  discusses  several  special  cases 
of  our  reduced  model.  In  Section  3.4,  we  consider  consistency  of 
the  estimators  in  our  models. 

3.1.  Reduction  of  the  Model  to  a Canonical  Form 
Consider  the  following  model  and  hypothesis: 

(3.1.1)  X = F]  ^ F2  + E, 

(3.1.2)  U1  = F3  = ab, 

where  X^F^  .h^.E.Ui  ,F3,a,  and  b are  defined  in  the  introduction 
to  this  chapter.  In  this  section,  we  reduce  (3.1.1)  and  (3.1.2) 
to  a simpler,  or  canonical  form. 
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Let  us  discuss  the  following  transformation: 


Yi  (FiFi)^Fi 

(3.1.3)  Y = ( ) = [ '}  ']X, 


where  V-j  is  a cxc-p  column  orthogonal  matrix  which  satisfies 

V^F-|  = 0.  Whenever  we  write  the  square  root  (or  negative  square 

root)  of  a matrix,  we  mean  the  unique,  symmetric,  positive  definite 

square  root.  The  columns  of  Y are  independently  distributed  with  a 

2 

normal  distribution  having  covariance  matrix  a -Ic. 


The  mean  of  Y is 


.x 


(FiFi)‘2  Fi  (FiFi)2  2 f 

E(Y)  = E[(  ])X]  = ( 1 1 Q 2). 

Let 

X?  xn  Y1  .a 

X*  = ( 1 3)  = ( 1 )(FA(F2FA)  *,Vi), 

y*  y*  y (-  t- 

a2  a4  '2 


where  V2  is  a N-mxN  column  orthogonal  matrix  which  satisfies 
V^F^  = 0.  By  Theorem  3.3.1  of  Anderson  [1958],  the  columns  of  X* 
will  have  independent  normal  distributions  with  covariance  matrix 


o • I . The  mean  of  X*  is 
c 

Y,  -a 

(3.1.4)  E(X*)  - E(^)(F^(F2F')  *,V2)  = ( 


(F-jF-j )^  h(F2F£)s  0 


). 


If  we  let 


(3.1.5) 


= (f-f1 K s(f„f;)2  , 


22 


our  hypoth.  , is  in  terms  of 


becomes : 
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(3.1.6)  U1  (F*F1  )~^  =*(F2F '?)~^  F3  = ab. 

We  will  now  make  several  substitutions  which  will  make  our  hypothesis 

(3.1.6)  simpler. 

Let 

(3.1.7)  U2  = U-|  (F-jF-j )"”, 

F4  ■ (F2Fp-4  F3{F'(F2F^)-1F3r*. 
d = b(F^(F2F^)'1F3)^. 

With  these  substitutions  our  hypothesis  becomes: 

(3.1.8)  U2  -*F4  = ad, 

where  F^  is  a known  column  orthogonal  matrix. 

We  now  write  the  joint  distribution  of  X|,  X£,  X*  and  X^: 

o -0  NP  ? 

(3.1.9)  f(X^,X*,X*,X*)=(2n0^)  exp(-2a^[tr(X*-H*)(X*-H*)'  + 

tr  X*X*'+tr  X$X*'+tr  X*X*']). 

From  a quick  examination  of  (3.1.9),  we  conclude  that  we  could  get 
the  MLE  (a2)  of  o2  if  we  knew  the  MLE  (=*)  of  =*.  Our  result  would  be 

(3.1.10)  a2  = ^[tr(X|-*)(X*-i*)'+tr  X*X*’+tr  X*X*'+tr  X*X*']. 

From  (3.1.9),  wo  also  know  that  X^  is  a sufficient  statistic 

2 

for  U2  and  a when  a is  treated  as  a fixed  quantity.  It  is  clear 
that  finding  the  estimators  of  U2,  a,  and  which  satisfy  (3.1.8) 
and  which  maximize  (3.1.9)  is  equivalent  to  finding  the  estimators 
of  U2,  a,  z*  which  satisfy  (3.1.8)  and  which  minimize 
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tr(Xf  - s*)(X1  - £*)'. 

Therefore,  we  need  only  consider  functions  of  when  we  find  MLE's 
of  Uo,  a,  and  =*. 

We  have  reduced  our  estimation  problem  to  the  following  problem. 
Let  our  model  be 

(3.1.11)  X*  = =*  + E*, 

where  X|  is  a pxm  matrix  of  observed  values,  s*  is  an  unknown  pxm(p<m) 

matrix,  and  E*  is  a pxm  error  matrix.  Each  column  of  E*  is  distributed 

as  an  independent  p-dimensional  normal  distribution  with  mean  vector  0 

2 2 

and  covariance  matrix  a -Ip,  where  a is  unknown.  We  want  the  MLE's 
of  = * , and  of  two  other  matrices  U2  and  a which  satisfy 

(3.1.12)  U2  h*F4  = ad, 

where  U2  is  an  unknown  rxp  matrix,  is  a known  mxk  column  orthogonal 

matrix,  a is  a unknown  rxs  matrix  and  d is  a known  sxk  matrix.  We 

refer  to  (3.1.11)  and  (3.1,12)  as  either  the  reduced  model  or  the  model 

in  canonical  form.  Note  that  s<_r<p<k  and  k-s>p. 

In  the  next  section,  we  will  find  the  MLE's  of  the  parameters  in 

the  reduced  model.  We  will  also  use  the  MLE's  of  =*,  a,  and  U2  in 

the  reduced  model  to  get  the  MLE's  of  =,  a,  and  U-j  in  the  general 

model  (3.1.1)  and  (3.1.2).  It  should  be  noted  that  the  MLE  of 

2 

for  the  reduced  model  is  not  the  MLE  of  o for  the  general  model. 
Equation  (3.1.10)  gives  us  the  MLE  of 


for  the  general  model . 
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3.2.  Maximum  Likelihood  Estimators  fo r the  Model  in  Canonical  Form 

In  this  section,  we  will  get  the  MLE  for  the  parameters  of  the 
reduced  model  described  at  the  end  of  Section  3.1.  We  will  also  give 
the  MLE  for  the  parameters  of  the  general  model  (3.1.1),  (3.1.2). 

Let  our  model  be  the  model  described  at  the  end  of  Section  3.1. 
As  in  Chapter  1,  it  is  clear  that  if  we  find  one  set  of  MLE's  (O^.a) 
of  U^a  then  Al^Aa  is  also  a set  of  MLE's  of  where  A is  any 
invertible  matrix.  Because  of  this,  we  will  require  that  L^  be  row 
orthogonal . 

The  method  of  finding  the  MLE’s  of  L^,  =*,  and  a will  be  similar 

2 

to  what  we  did  in  Chapter  1.  We  will  1)  fix  ; 2)  find  the 

2 

MLE's  of  h*  and  a as  functions  of  the  fixed  values  of  U^o  ; 

3)  substitute  this  estimate  of  =*  back  into  the  likelihood-,  and 

2 

4)  find  the  maximum  likelihood  estimator  of  U^a  . 

2 

Part  1 . ^2'—  fixed  or  gi ven 

We  will  now  transform  X|  into  a form  in  which  the  estimators 
of  =*  and  a are  easy  to  see.  Let 

P1  U? 

(3.2.1)  P = (p  ) = (v2)Xf, 

P2  V4  1 

where  is  a p-rxp  row  orthogonal  matrix  which  satisfies  V^U;,  = 0. 

Each  column  of  P has  an  independent  p-dimensional  normal  distribution 

? 

with  covariance  matrix  o -I  . The  mean  of  P is 

P 

U,  U9  U~  r* 

(3.2.2)  E(P)  = E(/)(X*)  = (/)E(X*)  = (/  r*). 

v4  1 v4  4 “ 
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Let 

(3'^‘ 3)  R = 'rJ  I)  = P'F4  V’ 

_ rprF4  prv5, 

- (p2-f4  p2-v5>’ 

where  Vg  is  a mxm-k  column  orthogonal  matrix  that  makes  (F4,Vg) 

an  orthogonal  matrix.  By  Theorem  3.3.1  in  Anderson  [1958],  the 

columns  of  R have  independent  p-dimensional  normal  distributions  with 
2 

covariance  matrix  a -Ip.  The  mean  of  R is 


R,  Rp  Up  -*  F. 

(3.2.4)  E(R)  = E(  1 2)  = (2  .*  4 

k3  k4  v4  = h4 


U2  a*  V5 

v4  -=*  v5 


) = ( 


a d 


U2  **  V5, 


V.E*  F,  V . -*  Vr 


From  the  above  expression  it  is  easy  to  get  the  MLE's.  Since  all 
elements  of  R are  distributed  independently,  we  have  that  the  MLE  of 
a*  F^  is  Rj,  of  u2  a*  Vg  is  R2,  and  of  V4  a*  Vg  is  R^.  We  can 
apply  a standard  theorem  in  multivariate  regression  to  get  the  MLE 
of  a: 

(3.2.5)  a = R-jd'  ( dd ' )_1 . 

We  now  can  get  the  MLE  of  a*: 

U9  , R,d'(dd ’)']d  R,  i 

(3.2.6)  a*  = (^)  (R^  R^)  (F4’V  ’ 


Rnd' (dd' )_1d  R,  Fi 
= (U2,V^)  (p^  R^)  (v^- 


If  we  go  backwards,  using  first  (3.2.3) 


M /4  4-  /-\  ( 0 


get 
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5*  = U'U2X*F4d1(ddir1d  F-+V’V4XfF4F- 
Using  the  facts  that  Ip  - V^V4  = U £U2  and  I 

=*  = X*-U^U2X*F4(Ik-d'(dd')-1d)F’, 
a = U2X|F4d'(dd')'1. 


+ U2U2XtV5V5  + V4V4X1V5V 


k • V5V5  = F4F4  * we  9et 


* 2 
It  should  be  noted  that  neither  s*  nor  a is  a function  of  o . 

We  summarize  our  results  so  far. 


Theorem  3.2.1.  If  our  model  is  X|  = =*  + E*  where  each  column  of 

E is  distributed  independently  with  a p-dimensional  normal 

2 

distribution  having  mean  vector  0 and  covariance  matrix  a -I 
o 

(o  is  a fixed  quantity),  then  the  MLE's  of  =*  and  a which  satisfy 
the  hypothesis  U2  * F4  = ad,  where  U2  is  a fixed  rxp  row  orthogonal 
matrix,  F4  is  a known  mxk  column  orthogonal  matrix  and  d is  a 
known  sxk  matrix  are 

(3.2.7)  =*  = X*-U^U2X*F4(Ik-d'(dd')_1d)F- , 

(3.2.8)  a = U2X*F4d'(dd')_1. 

Part  2 . Substitution  of  our  derived  MLE 's  back  into  the  likelihood 
and  maximization  with  respect  to  U0 ,o~. 

2 '■ 

In  this  part,  we  find  the  MLE's  of  U2  and  a using  =* , a as 
defined  by  (3.2.7)  and  (3.2.8).  We  now  write  the  distribution  of 
X^  after  substituting  h*  for  =*. 


U~> 


(3.2.9)  f(X*,:-'*,o2,U2)  = (2no2)'2  mpexp(-2o2)tr(X|-E*) (Xf-=*) ' 

If  we  want  to  maximize  (3.2.9),  all  we  have  to  do  is  minimize 

Q = tr(X*-‘*)(X*-  *)', 

= tr(U^U2X*F4(Ik-d ' (dd1 )_1d)F' )(U^U2XfF4(Ik-dl (dd1 )_1d)F^) 

(3.2.10)  = tr(U2X|F4(Ik-d'(dd')'1d)F^X^,Up. 

Minimizing  Q subject  to  the  condition  that  is  row  orthogonal 
(i.e.,  U2U2  = I(J  is  a straight  forward  application  of  the  Courant- 
Fischer  Min-Max  Theorem  (see  Bellman  [1970]).  If  we  let  the  columns 
of  U2  be  the  eigenvectors  associated  with  the  r smallest  eigenvalues 
of 

(3.2.11)  M = X*F4(Ik-d1(ddT1d)F'Xf\ 


then  U2  minimizes  Q and  therefore  is  the  MLE  of  U2.  The  minimum 

value  of  Q is  ? A.  where  A.  is  the  ith  largest  eigenvalue  of  M. 
i=p-r+l  1 1 

At  this  point  we  should  talk  about  zero  eigenvalues  of  M.  Since 

the  rank  of  I ^ -d 1 (dd1 )’ ‘d  is  k-s,  M will  have  full  rank  with 

probability  one  if  and  only  if  k-s  >_p,  i.e.,  M will  have  zero 

eigenvalues  with  probability  one  if  and  only  if  k-s  < p.  In  all 

cases  in  the  succeeding  sections,  we  assume  that  k-s  >_  p. 
o 

The  MLE  of  o is  easy  to  get  since  we  know  the  minimum  value 
~ 2 2 

of  Q.  The  MLE  (a  ) of  o in  our  reduced  model  is 


~ 2 
a 


1 


mp 


R 

t Ai 

i=p-r+l 


where  A.  is  the  ith  largest  eigenvalue  of  M. 


* ■ ' - , 


o 


o 


Let  us  summarize  our  results  in  the  following  theorem: 


Theorem  3.2.2.  The  MLE's  of  Up,  a,  *,  and  in  the  reduced 
model  (3.1.11),  (3.1.12)  are: 

a = UjjX^d'fdd')"1  , 

z*  = X|-U^U2X|F4(Ik-d'(dd')"1d)F^ 

c2-U)l 

mp  i=p-r+l  1 

where  is  the  ith  largest,  eigenvalue  of  M,  the  rows  of  U2  are 
the  eigenvectors  associated  with  the  r smallest  eigenvalues  of  M, 
and 

M = X|F4(Ik-d'(dd’)"1d)F-X|.' 


Remark  I.  If  we  multiply  and  a on  the  left  by  any  invertible 
matrix,  the  resulting  matrices  would  also  be  MLE's. 

Remark  II.  All  matrices  which  are  MLE's  of  l'2,a  are  of  the  form 
HU2,  Ha  for  some  invertible  matrix  H. 

Theorem  3.2.2  gives  us  the  MLE's  of  the  parameters  in  our 
reduced  model.  If  we  use  the  MLE's  of  *,  U2  and  a given  in 
Theorem  3.2.2  for  our  reduced  model,  and  also  use  (3.1.3),  (3.1.4), 
(3.1.5),  and  (3.1.?),  wo  can  get  the  MLE's  of  , Up  and  o in  the 
general  model . 

p 

Recall  that  the  MLE  for  o in  the  general  model  is  given  by 


(3.1.10): 


Following  (3.2.11),  we  found  that  the  minimum  value  of 

P 

tr(X?-?*)(Xt-H*) 1 is  l A.,  where  A.  is  the  ith  largest 
‘ i=p-r+l  ' 

eigenvalue  of  M which  is  defined  in  Theorem  3.2.2.  If  we  use  the 
definitions  of  X*,X?  and  X|,  we  get 

tr  X*X*'+tr  X*X$'+tr  X*X* 1 = tr  ( XX  1 -F]  (F ' F-, ) XF^(F?F ' ) FgX ' ) . 

Combining  the  preceding  arguments,  we  finally  have 

52  = i4(  l A nr(XX1-F1(FjF1)-1F^XF^(F2Fp'1F2X')), 
i=p-r+l 

where  A^  is  the  ith  largest  eigenvalue  of  M. 

2 

We  now  give  the  MLF's  of  U-j , a,  a,  and  o for  the  general 
model  in  the  following  theorem. 

2 

Theorem  3.2.3.  The  MLE's  of  U2,  a,  , and  a in  the  general 
model  (3.1.1)  (3.1.2)  are: 

^ = U^F^)*, 

a = UgX^d'tdd')"1, 

= = (F^r*  =*(  f2f')"^, 

a2  = jji(  j ^Ai+tr(XX,-F](FjF1)'1FjXF'(F2Fp‘1F2X1)) 

where  A^  is  the  ith  largest  eigenvalue  of  M,  the  rows  of  U2  are 
the  eigenvectors  associated  with  the  r smallest  eigenvalues  of  M, 


M = XyF4(Ik-d'(dd  ,)"1d)F'X*', 

X1  = (FiFl)4  FiXF2(F2F2r'^ 

F4  = W*  F3(F3(F2F2^lF3)_i’ 
d = b(F'(F2F^)-1F3)‘2, 

e*  = X*-U^2X*F4(Ik-d'(dd')'1d)F'. 
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Remark  I . If  we  multiply  U-|  and  a on  the  left  by  any  invertible 
matrix,  the  resulting  matrices  are  also  MLE’s. 

Remark  II.  All  MLE's  of  U-j  and  a are  of  the  form  AU-j , Au  where  A 
is  some  invertible  matrix. 


Remark  III.  The  rows  of  U-j  are  themselves  eigenvectors  corresponding 
to  the  r smallest  eigenvalues  of 

( F1 F ! ) M(FiFi>4- 


3.3.  Special  Cases 

The  models  we  consider  in  this  section  are  all  special  cases 

of  our  reduced  model.  It  should  be  noted  that  our  reduced  model  can 

be  considered  as  a special  case  of  our  general  model  if  we  take 

F,  = I , F0  = I , and  F0  to  be  a column  orthoqonal  matrix. 

1 p 2 m 3 

Consider  the  following  situation: 

(3.3.1)  xi  = S^e.;  i = 1 ,2 , . . . ,m; 

where  x^  is  a p-dimensional  vector  of  observations,  f.^  is  an 
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unknown  p-dimensional  mean  vector,  and  e^  is  a p-dimensional 
error  vector.  Each  e^  is  distributed  independently  with  a normal 

p 

distribution  having  mean  vector  0 and  covariance  matrix  a -I 
2 

(a  is  unknown).  We  want  to  estimate  unJer  the  hypothesis  that 
(3.3.2)  = ci;  i = l,2,...,m; 

where  a,  U2  are  unknown  rxl  and  rxp  matrices  respectively.  If  we 
let 

~ ( xi  »*2 ’ ■ ’ • ,xm)  * * ~ ( ?i  > » • • • » Cj^j) » 

E*  = (e] ,e2,. •• ,em), 
then  (3.3.1)  and  (3.3.2)  can  be  written 


X*  = =*  + E*, 
U2  =*F4  = ad, 


where  = 1^  and  d = (1,1,...,!).  In  this  form  our  model  looks 
identical  to  the  model  in  Theorem  3.2.2.  Using  Theorem  3.2.2,  we 
get  the  following  application. 


Application  1 . Assume  our  model  is  xi  = £.+e.j  and  we  want  to 

estimate  U2,  C.j , and  a subject  to  1)2^^  = a,  where  x^ , £.,  ei , U2, 

2 

and  a are  defined  above.  Then  the  MLE's  of  Uo,  a,  £.. , and  0 are: 
a = ^2^’ 

Ci  = xi"^2^2"xi~x^  * 

*2  J_  P . 

° ' mp  1-p-r+l  V 


■ 


o 
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where  A.  is  the  ith  largest  eigenvalue  of 
m 

M=  l (xi-x)(xrx)'t 

i = l 

the  rows  of  are  eigenvectors  corresponding  to  the  r smallest 

eigenvalues  of  M,  and 
m 

x = l x./m. 

i = l 


O 


o 


Remark  I.  Al^  and  Aa  where  A is  an  invertible  matrix  are  also 
MLE's  of  and  a. 

In  all  the  theorems  and  applications  discussed  so  far,  we  have 
remarked  that  the  estimator  of  the  unknown  linear  restrictions 
(U-j  or  ) is  not  unique.  In  fact  any  invertible  matrix  times 
or  would  also  be  a ML  FT . In  the  application  we  now  discuss,  we 
require  that  the  last  r columns  of  our  maximum  likelihood  estimator 
of  are  the  identity  matrix  (see  the  beginning  of  Section  1.2  and 
also  the  discussion  preceding  Application  4 in  Section  1.3). 

VJe  will  now  consider  the  following  model: 

(3.3.3)  yi  = vi+f. ; i = 1,2,... ,m; 

zi  = Hvi+«+gi ; i = 1 ,2,. . . ,m; 

where  y^  and  z . are  p-r  and  r dimensional  vectors  of  observed 

values,  v.j  is  an  unknown  p-rxl  vector,  H is  an  unknown  rxp-r 

parameter  matrix,  and  f . , g^  are  p-r  and  r dimensional  error 

vectors  which  are  distributed  independently  of  one  another  with  a 

normal  distribution  having  mean  vector  0 and  covariance  matrix 
? 2 

o -I  and  a -I  respectively, 
p-r  r J 


A 


We  will  now  rewrite  the  above  model  in  such  a way  that  it 
can  be  easily  seen  to  be  a special  case  of  Theorem  3.2.2.  Let 


x*  = ,yl*  y2’“ " ’ ym 
Z1  ’ z2  ’ ' ' ' ’ zm 


), 


'1 


, v 


2” 


=★  = / 1 *■  m \ 

'Bv-j+a,  6v0+a , . . . ,Bv  +a'* 
1 2 m 


,fl’  f2 ’ ‘ ’ fm, 

g-j,  g2»--- » V’ 


then  (3.3.3)  can  be  formulated  in  the  following  way: 


XT  = s*+E*. 

(-H.I)  h*  = a(l  ,1  , . . . ,1 ) . 


It  is  clear  that  the  above  model  and  hypothesis  is  exactly  the 
same  as  in  Application  l,with  the  exception  that  U2  must  have  the 
identity  as  its  last  r columns.  If  U2  = (U^>U?2)  - U22  is  rxr  - 
is  the  estimate  of  U2  in  App1  i cation  1,  then  we  can  get  the  MLE 
of  H from  the  following  expression: 

( -H , I ) = (U'],U21,I)  = U2J(U21,U22)  - U2J(U2). 

Since  (-H,I)  is  an  invertible  matrix  times  U2,  it  is  also  a MLE 
of  U2.  It  is  clear  that  when  we  substitute  (~H,I)  into  (3. 2. 1C) 
for  U2,  that  Q is  minimized.  Since  ( -H , I ) has  the  right  form 


We  summarize  our  results  in  the  following  application: 
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Application  2.  Assume  that  our  model  is 


yi  ■ VV 


i 1)2).., 


zi  = Hvi+a+gi ; i ' 1,2 m; 

where  y..,  , a,  H,  and  are  defined  above.  Then 

o 

the  MLE's  of  H,  a,  vi , and  o are: 

H = -U^(U21), 


a = ( -H , I ) ( ), 


(.r  ->  = U1)  " (-h»i)’(hh,+i)“1(-h,i) 


Hv^+a 


o = I *-/mp, 
i-p-r+1 

where  Ai  is  the  ith  largest  eigenvalue  of  M,  the  rows  of  (U2-j  »U22) 
are  the  eigenvectors  associated  with  the  r smallest  eigenvalues  of 


M,  and 


")  y j-y  yry 
M = l 1 -)(  1 -)' 
i=l  zi"z  zi~z 


Remark  I.  H,a  are  unique. 

Application  2 is  a generalization  of  the  model  considered 
first  by  Gleser  and  Watson  [1973]  and  later  by  Bhargava  [1975]. 
The  proof  utilized  in  these  papers  cannot  be  generalized  to  cover 
our  case.  Their  model  is  a special  case  of  Application  2,  where 
a = 0. 


A 
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We  will  conclude  this  section  with  a model  which  can  be  con- 
sidered a combination  of  the  "error-in-variables"  model  and  the 
usual  linear  regression  model.  In  the  model  we  discuss  there  are 
some  variables  which  are  measured  with  error  and  other  variables 
which  are  measured  perfectly.  Consider  the  following  model: 


■ vv- 

i = 1 ,2,.. . ,m; 

= Hvi+adi+gi; 

i = 1 ,2,. . . ,m; 

where  yi , , H,  v^ , and  gi  are  the  same  as  in  Application  2, 

a is  an  unknown  rxs  matrix,  and  d^  is  a known  sxl  matrix,  v^  is  the 
variable  which  is  measured  with  error  and  d-  is  the  variable  which 
is  measured  perfectly.  We  may  apply  Theorem  3.2.2  in  a manner 
similar  to  what  we  did  for  Application  2 to  get  NILE'S  of  H,  v^ , and 
a.  If  we  do  this  and  use  the  fact  that  d = (d-j  ,d2 , . . . ,dm) , we  get 

Application  3.  Let  our  model  be 


yi  ’ W 

i = 1 ,2, . . . ,m; 

zi  = Hvi+adi+gi ; 

i = 1 ,2, . . . ,m; 

where  yi , v^ , z..,  H,  v- , f . , and  gi  are  the  same  as  in  Application  2, 

a is  an  unknown  rxs  matrix,  and  d.  is  a known  sxl  matrix.  The  MLE  of 
2 

H,  a,  v.j , and  a are: 

H = -U^{U21), 

my.  m , 

a = ( -H , I ) ( l (_ 1)dl)(  l d.dl)'1, 
i=l  zi  1 i=l  1 ' 
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(-  - ) = L1)  - (-H,I)'C(H,I)(H,I)']"1[(-H,I)Li)  - adj, 
Hv.+ad.  zi  zi  1 


l 


i=p-r+l 


A./mp, 


where  is  the  ith  largest  eigenvalue  of  M,  the  rows  of  (U2] >Upp) 
are  eigenvectors  associated  with  the  r smallest  eigenvalues  of  M, 
and 


Remark  I.  H,  a are  unique  MlE's. 


3.4  Consistency  of  the  Estimators 


In  this  section  we  discuss  the  consistency  of  the  estimators 
from  Section  3.2.  We  first  work  with  our  reduced  model.  All  the 
results  for  the  reduced  model  are  rigorously  proved.  For  the  general 
model,  we  merely  state  our  results  since  they  follow  from  the 
results  for  the  reduced  model. 

Let  us  consider  Up, a the  estimators  of  Up, a in  our  model.  In 
order  to  make  a discussion  of  the  consistency  of  U9  and  a meaningful, 
we  have  to  place  restrictions  on  Up  and  Up  which  will  make  them 
unique.  Our  arguments  here  are  the  same  as  in  Section  1.2  of 
Chapter  1.  Let  (U|,ci*)  be  the  unique  members  of  the  class  of 
matrices  (Up, a)  which  satisfy  U£  e*F^  = ctd,  where  Up  has  the 
identity  matrix  as  its  last  r columns.  Let  U£  be  the  unique  MLE 
of  Up  which  has  the  identity  matrix  as  its  last  r columns.  In 
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Application  2 and  Application  3 of  the  previous  section,  we 
satisfy  this  requirement.  We  will  show  that  U£,  a*  are  strongly 
consistent  estimators  of  U£,  a*.  First,  we  will  prove  some  useful 
lemmas. 

Lemma  1 . Assume  that 

m"1  =*F4(Ik-d,(dd,r1d):*1 
converges  to  a finite  matrix  R.  Then 

m_1M  = m_1X*F4(Ik-dl ( dd ' ) _ 1 d ) F^X* ' 
goes  almost  surely  to  R+(l-t])o  • I where 

t-|  = lim  (m-(k-s))nf  ^ = 1 i m (m-k)rrf'*. 
m-x»  m xjo 

Proof.  Consider  X|  which  is  a pxm  matrix.  Each  column  of  X|  has  an 

independent  p-dimensional  normal  distribution  with  covariance  matrix 
2 , 

o -Ip.  The  mean  of  XT  is  *.  X|F4  is  a pxk  matrix.  Since  F4  is  a 

column  orthogonal  matrix,  each  column  of  X^F^  is  distributed 

independently  with  a p-dimensional  normal  distribution  with 
2 

covariance  matrix  a -I  . The  mean  of  X*F,  is  =*F«.  We  have 

p 4 4 

m'1X*F4(Ik-d‘ (dd1 )-1d)F-X*=m" 1 (=*+£*) F4( I k-d ' (dd* ) " 1 d ) F4 ( h*+E* ) ' 
(3.4.1)  = m-1H*F4(Ik-d'(dd,)"1d)F‘E*'+m"1E*F4(Ik-d,(dd')‘1d)F^*' 

+ m'1H*F4(Ik-d'(dd,)"1d)F^E*,+m"1E*F4(Ik-d'(dd,)*1d)F^E*’. 

By  our  assumptions  and  Lemma  2 of  Chapter  1,  we  have  that  the  2nd 
and  3rd  terms  on  the  right-hand  side  of  (3.4.1)  go  almost  surely  to  0. 
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By  our  assumptions,  the  1st  term  goes  to  R.  If  we  use  Theorem  4.3.2 
in  Anderson  [1958]  we  find  that  the  last  term  has  the  same  distribu- 
tion as 

k-s 


where  u^  has  a normal  distribution  independent  of  u^.  (i  f j)  with 
mean  vector  0 and  covariance  matrix  a -I  . We  know  that 


k-s 


m J,  Vi  ■ ir  T,  uiui/k-s 


k-s 


goes  almost  surely  to  (l-t^)o  . I If  we  combine  all 
statements,  we  get 


the  above 


i X*F4(Ik-d'(dd')"1d)F^*‘ 

Q.E.D. 


Lemma  2.  If  R is  finite  of  rank  p-r,  then  the  only  matrix  which  has 
the  identity  as  its  last  r columns  and  when  multiplied  by  R yields 
0 is  U*. 


Proof.  m‘1U*(=*F4(I-d,(dd,)''1d)F^*,)=m"1ad((I-d,(dd,)"1d)F^*,)=0. 
Since  this  is  true  for  every  m,  it  is  true  in  the  limit,  i.e.,  U^R=0. 
Since  R has  rank  p-r,  it  has  a unique  r-dimensional  space  of  eigen- 
vectors associated  with  eigenvalue  0.  Let  us  consider  a matrix  whose 
rows  form  a basis  for  this  eigenspace.  If  this  matrix  is  to  have 
the  identity  matrix  as  its  last  r columns,  it  is  clear  that  this 


matrix  must  be  U£. 


Q.E.D. 


Theorem  3.4.1.  Under  the  assumptions  of  Lemma  1 and  Lemma  2, 


0£  is  a strongly  consistent  estimate  of  U£. 


Proof.  By  Lemma  1,  we  know  that  m”^M  goes  almost  surely  to 
R+(l-t-j)o  Ip.  Since  the  eigenvalues  of  a matrix  are  continuous 
functions  of  that  matrix,  we  are  able  to  conclude  that  the  r 
smallest  eigenvalues  of  m converge  almost  surely  to  the 
smallest  eigenvalue  of  R+(l-t-|)o  I which  is  (l-t-|)o  . By 
Lernna  2,  U£  is  the  only  matrix  with  the  identity  as  its  last  r 
columns  which  satisfies  U£R  = 0.  We  may  conclude  that  U£  is  the 
only  matrix  of  the  right  form  whose  rows  are  eigenvectors 
associated  with  (l-t-j)o  the  smallest  eigenvalue  of  R+O-t-jJo'  - 1 p . 

Let  U|m  be  the  estimate  of  U|  if  we  have  m observations. 

Let  U2m  be  the  estimate  given  in  Theorem  3.2.2  used  to  generate 
U*m»  i -e. , 


2m  ' 2m 


2m 


2m  2m 


2m 


2m 


where  U2m  = (U^  ,U^} ) . Since  (U2m) • (U2J 1 = Ir>U2m  is  bounded 
almost  surely.  Let  us  pick  any  subsequence  of  U2n,.  Since  U2m  is 
bounded  almost  surely,  there  must  exist  a subsequence  of  this 
subsequence  which  converges.  Let  denote  the  convergent 
subsequence.  Also  let 


C = lim  U2jt  . 
m*°  "m 


Every  row  of  C is  the  limit  of  a sequence  of  eigenvectors  of 
m~^M  associated  with  one  of  the  r smallest  eigenvalues.  Since 


92 


I: 


"I  ? 

m M converges  almost  surely  to  R+fl-t^o  -I  , each  row  of  C must 

equal  some  eigenvector  of  R+(  1 -t-j  )°2Ip  associated  with  ( 1 -t-,  )o2 . 

Since 


lim 

nv«°  m 


U2„  * <*'  = Ir, 

m 


C is  of  full  row  rank  and  therefore  its  rows  must  span  the  space 
of  eigenvectors  of  R+O-t^o  Ip  associated  with  (1  -t-j  )o  . We 
already  showed  that  spans  that  same  space.  We  therefore  have 

U*  = (C<2>)"1(C(1,.C<2>)  = (C^Vc 

i 

where  C = (C^.C^). 

Let  ||A||  denote  the  largest  element  of  A.  We  will  now  show 

that  ||U£  - US | | goes  to  0 almost  surely. 

11  m 

I |ui  -0|| l-| - (c(2>)‘1c|| 

m mm 

i IICU^,)_1U2.  -(C<2>r!02,  IMI(c(2)r'G2ii  - 

mm  m m 


(3.4.4) 


(C!2))_1C| 

i I|U-  I!  ||(u<2>r1-{c(2>)'1ll+ll(c(2))-,|| 


l|u2,  -cl 


The  first  term  on  the  right-hand  side  of  (3.4.4), 

| 1 0 ^ tj  1 1 * 1 1 (U^ )’^-(C^ )~^  | | , is  arbitrarily  small  since  Upir  is 
"m  m m 

bounded  almost  surely  and  U^2'  converges  to  LliP.  Since  (C^2/)"'  is 

11  m "m 


•(2), l-l 


2n_ 


-C ! I goes 


bounded  and  IL  goes  almost  surely  to  C,  ||C 

^m  ~ m 

almost  surely  to  zero.  Combining  the  above  statements,  we  have  that 


goes  almost  surely  to  U^.  We  have  shown  that  for  any  subsequence 


m 
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vJ7 


of  U*m  there  exists  a subsequence  of  that  subsequence  which 
converges  to  almost  surely.  U|  must  converge  almost  surely 
to  U*.  Q.E.D. 

We  now  discuss  the  consistency  of  a*  = UiX^d1 (dd1 )-1 . 

Theorem  3.4.2.  If  (dd1)  "'•m  converges  to  a matrix  with  all 
elements  finite,  then  a*  is  a strongly  consistent  estimate  of  a* 
where  a*  satisfies  U^*F4  = ct*d. 


Proof.  Note  that 

a*  = Up^d'tdd1)'1 , 

= U*(H*+E*)F4d'(dd’)'1, 

O = U*s*F4d'(dd,)"1+U*E*F4d'(dd')':. 

Since  goes  almost  surely  to  U£,  U£  *F4d' (dd1 )_1  goes  almost 
surely  to 

U2H*F4d’(dd')"1  = a*dd'(dd')_1  = a*. 

If  we  apply  Lemma  2 of  Chapter  1,  we  cet  that  E*F4d' ( dd ' )”^ 
goes  almost  surely  to  0.  We  therefore  have  that  U£E*F4d' (dd1 )’^ 
goes  almost  surely  to  zero.  Q.E.D. 

2 

We  now  show  that  the  MLE  of  o in  the  reduced  model  is  not 
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X*F4(Ir-d'(ddT1d)F^X*7m>  we  have 


-2 

o 


mp 


l 

i=p-r+l 


p i=p-r+l  1 


a.s. 


Straightforward  substitution  yields  the  following  result: 
E*F4(Ik-d,(dd')'1d)F^*,/m  9-S'  R+(l-t1)a2(Ip-U^(U2U^)'1U2). 


Let  us  now  consider  the  parameters  in  the  general  model.  For 

the  definitions  of  all  terms  see  Theorem  3.2.3  and  the  beginning 

of  Section  3.1.  Let  U|  be  the  MLE  of  U-|  which  has  the  identity 

as  its  last  r'columns.  Let  a*  be  the  corresponding  value  of  a. 

Let  UT  a*  be  the  parameter  matrices  in  the  population  which 
' » 

satisfy  U|  eF^  = a*b,  and  U|  has  the  identity  matrices  as  its 
last  r columns.  We  could  prove  the  following  theorem  in  an 
analagous  way  to  what  we  did  for  the  reduced  model. 


Theorem  3.4.3.  If  our  model  is  the  model  of  Theorem  3.2.3  and  if 


=*^4 ( I k-d ' (dd*  )~1(^)F4e*1  = 

f Cl 


N',(F]F1)^(F2F^)2F4(Ik-d'(dd,)"ld)F^(F2F^)^(F'F1)s 

converges  to  a finite  matrix  R of  rank  p-r,  and  if  t,=l im(N-(k-s) )N"\ 

N-+°° 

then 

i)  N \x*(F4( I k~d 1 (dd1 )~^d)F4)X*' )goes  almost  surely  to 
R+O-t^o2-!  ; 

ii)  the  rows  of  are  eigenvectors  of  (F-jF-j^R  corresponding  to 
eigenvalue  0; 


- 
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iii)  is  a strongly  consistent  estimate  of  U^; 

iv)  if  N*(dd')  ^ converges  to  a finite  matrix,  then  a*  is  a 
strongly  consistent  estimate  of  a*; 

v ) ^ Ai  9oes  almost  surely  to  (1 -t-,  )a2r/c. 

m i=p-r+l  1 1 

Since 

(a2)_1tr(XX' -F1 (F^F, )"1F'XF^(F2F^)~1F2X' ) 

has  a chi  square  distribution  with  cN-pm  degrees  of  freedom,  we 
have 

(3.4.6)  tr(XX,-F1(F]F1)"1F]XF*(F2Fp'1F2X,)/a2(cN-pm)  V’  U 

provided  that  cN-pm  goes  to  » as  N does.  If  lim  = t5,  we  have 

N-*»  ” * 

tr(XX'-F1(FjF1)"1FjXF^(F2Fp_1F2X,)/Nc  * ' a2 ( 1 -s/c ( 1 - 1£ ) ) . 

It  we  combine  the  above  statement  and  v)  of  Theorem  3.4.3,  we  get 
that 

o2  a4-s‘  o2(l  + |((1  -t1  )r-(l-t2)p)) . 

Since  p > r,  and  1 -t?  = lim  ^ lim  = 1-t.  (m  > k),  o2  under- 

c N-x»  N-x»  H 

2 

estimates  a . 


o 
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CHAPTER  4 

TESTING  THE  EXISTENCE  OF  UNKNOWN  LINEAR 
RESTRICTIONS  IN  A GENERAL  LINEAR  MODEL 

4.0.  Introduction 

Let  our  model  be  the  Potthoff-Roy  model : 

(4.0.1)  X - F^Fg+E 

where  X is  a cxN  matrix  of  observations,  F(  is  a known  exp 

matrix,  = is  an  unknown  pxm  parameter  matrix,  F^  is  a known 

mxN  (N  >_  m)  matrix  and  E is  a cxN  error  matrix  whose  columns 

are  distributed  independently  with  a normal  distribution  having 

2 2 

mean  vector  0 and  covariance  matrix  a -I  (a  is  unknown).  In 
this  chapter  we  will  be  concerned  with  testing 

(4.0.2)  Hq:  U-jEF^ab  versus  : U-j  =F^  i ab, 

where  U-|  is  an  unknown  rxp  matrix,  F^  is  a known  mxk  (m  >_  k) 
matrix,  a is  an  unknown  rxs  matrix  and  b is  a known  sxk  matrix. 

In  Section  4.1,  we  derive  the  likelihood  ratio  test  statistic 
for  Hq  versus  H^.  In  Section  4.2,  we  find  the  asymptotic 
distribution  of  the  roots  needed  in  the  likelihood  ratio 
criterion.  In  Section  4.3,  we  use  the  asymptotic  distributions 
of  the  likelihood  ratio  test  statistic  to  get  asymptotic  tests  of 
Hg  versus  H-j.  In  Section  4.4,  we  show  the  tests  from  the  preceding 


section  are  consistent. 
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4.1 . Likelihood  Ratio  Test  Statistic 

In  this  section  we  find  the  likelihood  ratio  test  of 
Hq:  = ab  versus  H^:  U-jhFj  f ab,  wher.  our  model  is  given  by 

(4.1.1)  X = F^F2+E. 

All  variables  are  defined  in  the  introduction  to  this  chapter.  Our 
result  can  be  summarized  in  the  following  theorem. 


Theorem  4.1.1.  If  our  model  is  given  by  (4.1.1)  and  we  wish  to 
test  the  hypothesis  Hq : U^F3  = ab  versus  U^hF3  f ab,  then  the 
likelihood  ratio  test  statistic  is 


A 


1 


tr(XX'-F1  (F^F1  )_1F' XF^(F2Fp"1F2X’ ) 
j +iXi+tr(XX,-F1(F]F1)'1F^XF’(F2F‘)"1F2X') 


where  is  ith  largest  eigenvalue  of  M,  and 
M = X*F4(Ik-d,(dd')"1d)F'Xf\ 

Xf  = (FjF^-^XF^F^’)"-, 

F4  - (F2FJ)-iF3(FJ(F2FJ)-'F3)-i, 

» - b(Fi(F2Fp-|F3)-i. 


Proof.  We  need  the  maximum  value  of  the  likelihood  when  HQ  is  true 
and  when  is  true.  In  Chapter  3,  we  derived  the  MLE's  of  U-| , = , a 

O 

and  o when  the  Hq  is  true  (see  Theorem  3.2.3).  If  we  substitute 
these  estimators  into  the  likelihood  we  get: 


98 


max  L 
Hrt 


,,  -2,-4  <*  -4»  [(*-F, sF2)(X-F1 =f2) ' ] 

= (Zira  ) “ -e 
= (2ti)~£  cN(o2)‘2  cNe‘*  cN 


(2tt)-^  cNe-^  cN[fji  ( l x .+tr(X$XS')+tr(XiXS')  + 

i=p-r+l  1 6 c * 

tr(XJX|'))]“*  cN, 

(2*e)'^  cN[^  (l  r+iXi+tr(XX'-F1(F-F1)-1F]XF^(F2F')_1 


F2X'))] 


-i  cN 


where  X.  is  the  ith  largest  root  of  M,  and  M and  the  variables  which 
define  it  are  given  in  Theorem  4.1.1.  For  definitions  of 
XJ,  X^,  X*,  see  Section  3.1. 

We  now  get  the  maximum  value  of  the  likelihood  when  the 
alternative  is  true.  When  the  alternative  is  true,  our  model  is 


just  X = F-j^+E  with  no  restrictions  on  = . The  columns  of  E have 
the  same  distribution  as  under  HQ.  The  likelihood  function  is 

(4.1.3)  L(X.,.o2)  . (2„«2)-4  cNeV(tr(X-F,SF2)(X-FlSF2)')_ 


If  we  use  standard  multivariate  regression  procedures,  we 
get  that  the  MLE  of  h is 


(4.1.4)  s = (F'F] )_1F'XF^(F2F^)"1 . 
? 

The  MLE  of  o is  also  easy  to  get: 


t 

t 

I 


| 
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When  we  substitute  (4.1.5)  (4.1.4)  into  (4.1.3),  we  get  that  the 
maximum  value  of  the  likelihood  when  the  alternative  is  true  is 


(4.1.6)  max  L = (2ve)~*  cN[^  tr(XX,-F1(FjF1)"lF*XF^(F2F-)"lF2X')] 
H1 

If  we  combine  (4.1.2)  and  (4.1.6)  we  will  get  the  likelihood 
ratio  test  statistic  of  HQ  versus  H-j : 


-lc 


: • \ “1  r ym 


max  L 
Hrt 


A,  = 


1 max  L 


[tr(XX,-F1(F^F1)_1F^XF^(F2Fp"1F2X,)]2cN 
Hi  L [J  r+iX.+tr(XX,-F1(FjF1)'1FjXF’(F2Fp‘1F2X,)]2cN 


Q.E.D. 


Remark.  It  is  clear  that  the  likelihood  ratio  test  statistic  is 
a function  of 

A2  = f i+iAi/[tr(XX'-FT(FjF1)"1FjXF^(F2Fp-1F2X1)]  = a'2/cN-1  . 

The  numerator  and  denominator  of  the  above  expression  are  independent 

since  f A.  is  a function  of  X?,  the  denominator  is  a function  of 
i=p-r+l  1 1 

X|,  X*,  and  X£,  and  Xf  is  independent  X^,  X*.  and  X|  (see  Section 
3.1  for  definitions  of  X£,  X^j,  and  X^). 


Remark  II.  The  likelihood  function  can  be  made  arbitrarily  large 

- 

if  Fi  = Ip  and  F2  = 1^  by  taking  = = X and  o = € where  € is  an 

arbitrarily  small  positive  number.  Because  of  this,  there  does  not 

exist  a test  of  the  hypothesis  U2s*F 4 = ad  versus  U2::*F  f ad  in  the 

reduced  model.  What  causes  the  problem  is  that  under  the  alternative 

2 

hypothesis,  there  is  nothing  left  to  estimate  o after  we  fit  -r.  We 


i 


0 


3 


will  therefore  assume  that  lim  Nc-mp  = » when  we  test 

N-KO 

= ob  versus  U,=F3  ^ ab. 

4.2.  Asymptotic  Distribution  of  the  Roots 

In  this  section,  we  find  the  asymptotic  distribution  of  the 
roots  needed  in  the  likelihood  ratio  tests.  We  are  interested  in 
the  r smallest  roots  of 

|M-Xlp|  = 0, 

where 

(4.2.1)  M = XyF4(I|c-d,(dd,)"1d)F^X*’. 

It  is  helpful  to  work  with  the  r smallest  roots  of 

| (Na2)_1M-<J>*Ip  | = 0. 

It  should  be  noted  that  <p*  = (No2)"^a. 

We  now  prove  a useful  lemma  which  is  similar  to  Lemma  1 of 
Chapter  2. 

Lemma  1 . Let  our  model  and  hypothesis  be  given  by 

X = f]=f2+e, 

Ui =F^  = «b, 

where  X,  F-j , F2,  E,  U-j , Fj,  a,  and  b are  defined  in  the  introduction 

of  this  chapter.  The  roots  of 
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f 


where  M is  given  by  (4.2.1)  have  the  same  distribution  as  the  roots  of 

(4.2.3)  |N_1U*U*+N"^C+D0-4>*Ip|  = 0, 

where 


and  YiN  is  the  ith  largest  eigenvalue  of 

(No2)“1(=*F(I-d'(dd*)'1d)F^*'), 

and  U*  is  a pxk-s  matrix  whose  columns  have  independent  normal 
distributions  with  mean  vector  0 and  covariance  matrix  I . 


Proof.  First,  consider 

X1  = (F1F1 )"^F1XF2(F2F2)"^- 


* 


k 

• l 


w 


- 
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When  we  take  the  square  root  (negative  square  root)  of  a matrix,  it 

is  always  the  unique  symmetric  square  root  (negative  square  root). 

The  columns  of  are  independently  normally  distributed  with 
2 

covariance  matrix  0 -Ip.  The  mean  of  X|  is 
=*  = (FjF1)^(F2F’)i 
Now  consider  o"^X|F^,  where 

F4  = ( F2F^)-*F3t  FJ(  FeFJ ) - 1 F3  )-*. 

Since  F^  is  a column  orthogonal  matrix,  each  column  of  o”^X*F4 
is  distributed  independently  with  a normal  distribution  having 
covariance  matrix  Ip.  Next  consider  a-1X|F4Vg  where  Vg  is  a 
matrix  such  that 

V6V6-!k-s-  V6V6= 

d = b(F^(F2Fp_1F3)-^ 

Since  Vg  is  a column  orthogonal  matrix,  each  column  of  o’^X^Vg 
(which  is  pxk-s)  is  distributed  with  an  independent  normal  distri- 
bution having  covariance  matrix  Ip.  The  mean  of  0“  X^F4Vg  is 

E(o-1XfF4V6)  = o-1e*F4V6. 


Consider 


U ~ 0 1 ri x*F4V6r2 * 


where  T-j  and  r2  are  orthogonal  matrices  such  that 
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<n„2r’r,5.F4v2  • 


» 


and  Y-jjj  is  the  ith  largest  eigenvalue  of 

(o2N)'1=*F4V6V^F'=*'=(o2N)‘1H*F4(Ik-d,(dd,)'1d)F^*1. 

It  should  be  noted  that  the  r smallest  eigenvalues  of  the  above 
expression  equal  0 by  our  hypothesis.  We  may  write  (4.2.2)  the 
following  way: 

| (No2)_1M  - 4>*Ip  | = l N” 1 UU * I = 0. 

Finally,  we  make  the  following  substitution.  Let 

U*  = u-a'1r1H*F4v5r2. 

Then  each  column  of  U*  has  a normal  distribution  with  mean  vector 
0 and  covariance  matrix  Ip.  We  also  have 

UU1  = U*U*'+/N  C+D0, 

where  C and  Dq  are  defined  in  the  statement  of  the  lemma.  The 
lemma  now  follows.  Q.E.D. 


At  this  point  we  separate  into  three  cases 

Case  1:  k is  fixed; 

Case  2:  t^  t 1 ; 

k goes  to  infinity  as  N does,  t-j  = 1 ; 


Case  3 


104 


where  t,  = 1 i m (N-k+s)N"  . We  always  assume  that  r (the  number  of 

1 N-«° 

rows  of  U2),  p (the  number  of  rows  in  =),  and  s (the  row  rank  of 
b and  d)  are  fixed  quantities. 

For  each  case,  we  now  present  important  results  about  the 
asymptotic  distribution  of  the  roots.  For  Case  1,  we  have  the 
following  theorem: 


9 _ i 

Theorem  4.2.1 . Assume  that  k is  fixed.  Let  vi  = 
where  <f>*  is  the  ith  largest  root  of  j (No  ) M-<j>*Ip|  = 0. 

Then  the  limiting  distribution  of  (vp  r+] ’vp-r+2’ ' * ' »vp)  1S 


2-Mk-s-p+r)^r, 


P 

n 


, ,)?(k-s-p-l)e  i=p-r+l 


V2 


i=p-r+l 


n r(i(k-s-p+r-l-i ))r(i(  r+l-i )) 
i=l  ^ ^ 


P 

n 

i=p-r+l 


P 

JI 


j-i+l 


) 


Proof.  By  Lemma  1,  we  only  have  to  consider  the  distribution  of 
the  r smallest  roots  ($*  = i=p-r+l  ,p-r+2, ... . ,p)  of 

(4.2.4)  | N“ 1 U*U* ' +N“^C+D0-<J>*I  s J = 0, 

where  C,  U*,  and  DQ  are  defined  in  Lemma  1. 

For  Case  1 we  can  utilize  the  proof  given  in  Hsu  [1940]. 
Equation  (17)  in  Hsu  is  identical  to  our  equation  (4.2.4)  with 
the  following  correspondences: 
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All  that  we  have  to  do  is  follow  the  steps  in  Hsu's  proof.  Q.E.D. 

Remark.  The  distribution  of  the  roots  given  in  Theorem  4.2.1  is  the 
same  as  the  joint  distribution  of  Pi  where  is  the  ith  largest  root 
of 

| B - pi  I = 0 
and  B is  defined  by 

k-s-p+r 

B = l u.uj 
1=1  1 1 

where  the  u^  are  independently  distributed  with  a normal  distribution 
with  mean  vector  0 and  covariance  matrix  Ir> 

For  Case  2 and  Case  3,  we  make  the  following  assumptions. 


Assumption  1 . 


The  matrix 


(4.2.5)  (o2N)'1H*F4(Ik-d'(dd')"1d)F'=*! 

2 - 1 

converges  to  a finite  matrix  (a  ) R of  rank  p-r. 
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We  now  state  and  prove  several  theorems  for  Case  2. 

Theorem  4.2.2.  Assume  that  lim  (N-k)N’1  = ti  t 1 , and  that 

N-*ot>  ' 

Assumptions  1 and  2 hold.  Let 

v.,-  = (N<l>*-k)  (N‘k)~£;  i = p-r+1  ,p-r+2, . . . ,p; 

where  <J>T  is  the  ith  largest  eigenvalue  of  (No  ) M.  The  limiting 
distribution  of  (vp_r+i >vp.r+2» • • • »vp)  1S  the  same  as  the  distribution 
of  the  r roots  from 

i 

|(l/trl)V  vlr|  = 0 

where  Q-|  has  the  r-dimensional  matrix  normal  distribution  (see 
Lemma  2 of  Chapter  2). 

Proof.  By  Lernna  1,  we  only  have  to  consider  the  distribution  of  the 
r smallest  roots  (^*^|:  i = p-r+1,  p-r+2, . . . ,p)  of 

(4.2.5)  |N'1U*U*,+N"-C+D0-^*Ip|  - 0, 

where  C,  U*,  and  Dq  are  defined  in  Lemma  1. 

If  we  multiple  each  matrix  inside  (4.2.5)  by  N(N-k)-^  and  let 

D1  = N(N-k)‘1D0-N(N-k)‘1**Ip, 

* = N(N-k)"V> 

then  (4.2.5)  becomes 

| (N-k)‘1U*U*'+N£(N-k)"1C-D1 | =0. 


The  above  equation  is  exactly  the  same  as  equation  (2.2.12)  with  Z = 0. 
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We  therefore  may  follow  the  proof  of  Theorem  2.2.2  with  Z = 0.  The 


theorem  is  therefore  proved. 


Q.E.D. 


If  we  use  Theorem  2 in  Anderson  [1951b],  we  have 

Theorem  4.2.3.  Assume  lim  (N-k)N  ^ = t-,  ^ 1 and  Assumptions  1 and  2 

N-h»  1 

hold.  Let 

Pj  = (~  i = p-r+1,  p-r+2, . . . ,p; 

where  <j>*  is  the  ith  largest  eigenvalue  if  (No2)-1M.  The  limiting 
distribution  of  (p  +1 ,P  +2, • • • .Pp)  is 


r 

rf?  r -\  i "i  = iPP-r+i  P P 

2 r/2[  n r(i(r+l+i ))]-1e  1-1  n n (P.-P.). 

i=l  i=p-r+l  j=i+l  1 J 


We  conclude  with  several  theorems  for  Case  3. 


Theorem  4.2.4.  Assume  that  k -+  ~ as  N -*■  «,  that  lim  (N-k)N'  = 1, 

N-x® 

and  that  Assumptions  1 and  2 hold.  Let 


vi  = ^N^k  *1  ' NTk^Nk"^;  1 = P_r+1»  P-r+2,... ,p; 


where  <{>*  is  the  ith  largest  eigenvalue  of  (No^)'  M.  The  limiting 
distribution  of  (vp_r+i ’vp-r+2’ * * * ,vp)  the  distribution  of  the 
r roots  of 
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Proof.  By  Lemma  1,  we  only  have  to  consider  the  distribution  of 
the  r smallest  roots  i = p-r+1 ,p-r+2, . . . ,p)  of 

1 p 

(4.2.6)  |N  'U*U*'+N  C+DQ-$*Ip|  = 0, 

where  C,  U*,  and  DQ  are  defined  in  Lemma  1. 

If  we  multiply  each  matrix  inside  (4.2.5)  by  N(N-k)  ^ and  let 

D]  = N(  N-  k)-1  DQ-N(N-k )~ 1 <J>*Ip  * 

$ = N(N-k)"V» 


then  (4.2.5)  becomes 

|(N-k)"Vu*'+N^(N-k)"1C+D1 1 = 0. 

The  above  equation  is  identical  to  equation  (2.2.15)  with  Z = 0. 

We  may  follow  the  proof  of  Theorem  2.2.4  with  Z = 0 to  get  the 
required  result.  Q.E.D. 


Using  Theorem  2 in  Anderson  [1951b],  we  have: 


Theorem  4.2.5.  Assume  that  k ->  °°  as  N -*■  that  lim  (N-k)N  = 1, 

N-KD 

and  that  Assumptions  1 and  2 hold.  Let 

vi  = (fiTk  ♦*  ' 1 = p_r+1’  p*r+2’* 

0 1 

where  <t>*  is  the  ith  largest  eigenvalue  of  (No')  M.  Then  the  limiting 
distribution  of  (v  +1 »vp-r+2’ * ‘ * ’V  is 


2"r/2[  n r(i(r+l-i))]'1e  i=p'r+1  ' n n (v.-vj. 

i = l c i^-p-r+1  j-i+1  J 
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4.3.  Asymptotic  Tests  of  U^;F3=ab  Versus  U-i^^ab 

In  this  section,  we  use  the  asymptotic  distributions  of  the 
smallest  roots  given  in  the  proceding  section  to  get  the  asymptotic 
tests  based  on  the  likelihood  ratio  statistic  derived  in  Theorem 
4.1.1.  We  also  use  the  following  lemma: 

Lemma  1 . Let 

(4.3.1)  0 = (o2)'1tr(XX,-F1(FjF1)'1FjXF^(F2F-)'1F2X'). 

Then  (Nc-mp)"^(e-(Nc-mp) ) converges  in  law  to  a normal  random  variable 
with  mean  0 and  variance  2.  We  also  have  that  (Nc-mp)”^e  goes  almost 
surely  to  1 . 


Proof.  We  have  shown  that  9 has  a chi-square  distribution  with  Nc-mp 
degrees  of  freedom  (see  the  end  of  Section  3.4).  The  lemma  now 
follows  from  standard  theorems.  Q.E.D. 


By  Theorem  4.1.1,  the  likelihood  ratio  test  statistic  is 


A-j  = ^ 


-icN 


where 


l x./(o2e), 

i=p-r+l 


A.j  is  the  ith  largest  eigenvalue  of  M,  and  e is  defined  by  (4.3.1). 
In  terms  of  the  eigenvalues  (<(i*,<)>2, ...  ,<{>*)  of  (No  ) M we  have 

A~  = N f 4>*/e. 
c i=p-r+l  1 


no 


We  now  break  up  our  discussion  of  the  asymptotic  tests  into 
three  parts  which  correspond  to  the  three  cases  discussed  in  the 
preceding  section. 


Part  1.  Case  1:  k fixed. 

When  k is  fixed,  we  have  the  following  theorem: 


Theorem  4.3.1.  If  our  model  is  given  by  (4.0.1),  and  we  wish  to 
test  the  hypothesis  Hq:  U-j  =F^  = ab  versus  H-j : U,=Fq  f ab  when  k is 

fixed,  then  the  asymptotic  null  distribution  of 

(cN-mp)A2  = (cN-mp)(A^cN-l) , 


where  A-|  is  the  likelihood  ratio  test  statistic,  is  a chi-square 
distribution  with  r(k-s-p+r)  degrees  of  freedom.  The  a level 
asymptotic  test  of  Hq  versus  would  be  to  reject  Hq  when 

(cN-npKA^cN-l)  > xr2(k.s.p+r)0-.), 

2 

and  do  not  reject  otherwise,  where  xd(B)  is  the  Bth  fractile  of  a 
chi  square  distribution  with  d degrees  of  freedom. 

Proof:  When  k is  fixed,  the  asymptotic  distribution  of  N 

i-p-r+1  1 

can  be  easily  obtained  using  the  remark  following  Theorem  4.2.1. 

The  limiting  distribution  of  N ^ <j>t  is  the  same  as  the 

i=p-r+l 

distribution  of 


k-s-p+r 

tr(B)  = tr  l u^u 


i=l 


iui’ 


where  u^  are  independently  distributed  with  a normal  distribution 


Ill 


having  mean  vector  0 and  covariance  matrix  I . Since  each  diagonal 
k-s-p+r  r 

element  of  £ u^u!  has  a chi-square  distribution  with  k-s-p+r 
i=l  ' 

degrees  of  freedom,  and  since  there  are  r independent  diagonal  elements, 
the  distribution  of  the  tr(B)  is  a chi-square  distribution  with 
r(k-s-p+r)  degrees  of  freedom.  We  conclude  that  the  limiting  distri- 
bution of  N ^ <f>*  (for  Case  1)  is  a chi-square  distribution  with 

i=p-r+l  1 

r(k-s-p-r)  degrees  of  freedom. 

We  know  by  Lenina  1 that  0/(Nc-mp)  goes  almost  surely  to  one. 

Since  8 and  f <f>*  are  independent,  we  get  that 
i=p-r+l 

(Nc-mp)(A;2/cN-l)  = N l *?/(e/(Nc-mp)) 

' i=p-r+l  1 

has  a limiting  chi-square  distribution  with  r(k-s-p+r)  degrees  of 
freedom.  Q.E.D. 

Part  2.  Case  2:  ti  = lim  (N-k)N~ ^ f 1. 

N-*» 

When  the  number  of  parameters  increases  with  the  sample  size  in 

such  a way  that  t,  = lim  (N-k)N-1  / 1 , we  use  the  following  theorem 
1 N-**> 

which  gives  us  the  needed  asymptotic  test: 

Theorem  4.3.2.  If  our  model  is  given  by  (4.0.1),  and  we  wish  to 
test  the  hypothesis  Hq.-  U^F-j  = ab  versus  Hj : U^=F3  f ab  when 
ti  1 1,  then  the  asymptotic  null  distribution  of 

A = ( ?)i(k"^(Nc-pm)A?-er), 

J 2r(Nc-pm)+2kr^ 


I 
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-2/cN 

where  A2  = -1.  and  A-|  is  the  likelihood  ratio  test  statistic, 

is  a standard  normal  distribution.  The  a-level  asymptotic  test 
would  be  to  reject  Hg  when 

A3  > Zl-a* 

and  do  not  reject  otherwise,  where  Zg  is  the  B fractile  of  a standard 
normal  distribution. 


Proof.  Consider 

A A 

k 2(Nc-pm)A2-k2r 

(4.3.2) 


( l N^)k  2 

i=p-r+l  A 

0/(Nc-pm)  K r> 

\ (N<fr*-k)k"2-k^>[(e/(Nc-pm))-l] 

i=p-r+l  

0/(Nc-pm) 


Since  0/(Nc-pm)  goes  almost  surely  to  one,  we  have  that  the 
limiting  distribution  of 

k“2(Nc-pm)A2-k2r 
is  the  limiting  distribution  of 

(4.3.3)  S!  (N*t-k)k"2-k2r((0/(Nc-pm))-l). 
i=p-r+l 


When  t-j  ^ 1 , the  asymptotic  distribution  of 

l (N^-k)(k)'*  = ( (N-k)/k)^  l v r+. 
i=p-r+l  1 1=1  p r 1 

can  be  easily  obtained  from  Theorem  4.2.2.  The  limiting  distribution 

of  ) v.  is  the  same  as  the  distribution  of  tr ( 1 /t, -1 )^Q, , where 
i=p-r+l  1 1 1 
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Qj  has  an  r dimensional  matrix  normal  distribution.  Since  the 

A 

distribution  of  tr(l/t^-l )2Q^  is  a normal  distribution  with  mean 
zero  and  variance  2r(l/t^-l),  it  follows  that  the  limiting  distribution 


i=p-r+l 


(N^-k)k' 


is  a normal  distribution  with  mean  0 and  variance  2r. 

We  also  know  from  Lemma  1 that 

(Nc-pm)  2((0/(Nc-pm)  )-l ) 

is  asymptotically  distributed  as  a normal  random  variable  with  mean  0 
and  variance  2.  We  therefore  have  that 

k^r((6/(Nc-pm))-l) 

has  a limiting  normal  distribution  with  mean  0 and  variance 

lim  2kr^/ (Nc-pm) . 

N-**> 

If  we  combine  the  above  three  paragraphs  and  recall  that  0 

and  ^ are  independent,  we  have  that 

i=p-r+l  1 

k"2(Nc-pm)A2-kpr 

has  a limiting  normal  distribution  with  mean  0 and  variance 
2 

2r+2r  lim  k/(Nc-pm) 


since 


[2r+  (2r^k/(  Nc-pm)  ]”"*  = ( Nc-pm )/( 2r  (Nc-pm)+2kr^ ) . 


We  have  finally  that 


g 


IN 


( 7)*(k"i(Nc-pm)A,-k^) 

2r(Nc-pm)+2kr  L 

has  a limiting  normal  distribution  with  mean  0 and  variance  1. 


Q.E.D. 


Part  3.  Case  3:  k -*■  » as  N “ but  t-,  = lim  (N-k)N  = 1. 

1 N-*~ 

We  conclude  this  section  with  a theorem  which  gives  the 
asymptotic  test  of  Hq  versus  H-j  for  Case  3. 


O 


Theorem  4.3.3.  If  our  model  is  given  by  (4.0.1),  and  we  wish  to 

test  the  hypothesis  Hq : U-jeF^ab  versus  H-| : U-jsF^ab  when 

lim  (N-k)/N  = t-.  = 1 and  k -*•  ® as  N -»  ®,  then  the  asymptotic  null 
N-**> 

distribution  of 

= (Nc-pm)(2kr)~2A2-(kr/2)^, 

where  and  A-j  is  the  likelihood  ratio  test  statistic 

is  a standard  normal  distribution.  The  a-level  asymptotic  test  would 
be  to  reject  HQ  when  a3  > and  do  not  reject  otherwise,  where 

ZD  is  the  Bth  fractile  of  a standard  normal  distribution. 

D 


Proof.  Consider 

A^  = (Nc-pm)(2kr)"zA2-(kr/2)?, 


A 


A 


N(2kr)‘ 

i=p-r+l  _ 

e/(Nc-pmj 


( kr/2)2, 


A 


N(2kr)"z  l <j>^-(kr/2)s-(kr/2)2[(e/(Nc-pm))-l] 

i=p-r+l  _ 

0/(Hc-pm) 


? 


0 


Since  N ((e/(Hc-pm))-l ) goes  in  law  (by  Lemma  T)  to  a normal  random 
variable  with  mean  0 and  finite  variance,  and  since  for  Case  3, 
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lim  k/N  * 0,  we  have  that 

(k)'^((e/(Nc-pm))-l) 

goes  in  law  to  a random  variable  which  is  constant  at  zero. 

By  Lemma  1,  we  may  state  that  6/(Nc-pm)  goes  almost  surely  to  1. 

Since  e and  j[  are  independent,  we  know  that  the  asymptotic 

i=p-r+l  1 

distribution  of  A3  is  the,  same  as  the  asymptotic  distribution  of 

(4.3.4)  N(2kr)4^  ♦?  - (kr/2)*. 

i=p-r+l  1 

For  Case  3,  the  asymptotic  distribution  of  the  above  expression 
can  be  easily  obtained  from  Theorem  4.2.4.  Since  the  limiting 
distribution  of 

\ v.  = (N/N-k)[Nk^i<j>T-rk*] 
i=p-r+l  1 1 

is  the  same  as  the  distribution  of  tr  Q where  Q has  the  r 

dimensional  matrix  normal  distribution,  f v.  has  a limiting 

i=p-r+l  1 

normal  distribution  with  mean  0 and  variance  2r.  Since 
N-k 

lim-jj-  = 1,  we  can  conclude  that  the  limiting  distribution  of 

N-x*  N 

(4.3.4)  is  a standard  normal  distribution.  Q.E.D. 

4.4.  Consistency  of  the  Tests 

In  this  section,  we  discuss  the  consistency  of  the  tests  from  ‘ 
the  preceding  section.  A test  is  consistent  if  the  power  of  a test 
goes  to  one  as  the  sample  size  increases  when  a fixed  alternative 
is  assumed  to  be  true. 
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We  now  give  a description  of  what  our  fixed  alternative 
will  be.  For  each  N,  let  us  pick  = = =qN,  so  that  the  rth 
smallest  eigenvalue  of 

(4.4.1)  N'1  i*f(F4(  1 k-d ' (dd ' )_1  d) , 

is  a fixed  positive  number.  F4  and  d are  defined  in  Theorem  4.1.1, 
and 

% * <FiFl>  50N<F2F2>  • 

This  is  a very  reasonable  definition  of  fixed  alternative.  We  also 
assume  that  the  matrix  given  by  (4.4.1)  converges  to  a finite  matrix 
R. 

For  Case  1,  we  have  the  following  theorem: 

Theorem  4.4.1.  The  asymptotic  test  given  in  Theorem  4.3.1  (Case  1: 
k fixed)  is  consistent. 

Proof.  The  test  statistic  given  in  Theorem  4.3.1  is 
(Nc-mp)A£  = (Nc-mp)(A^cN-l ) 

= N ^ $T/(e/(Nc-mp)) 

i=p-r+l 

where  e is  given  by  (4.3.1)  and  4>T  are  the  eigenvalues  of  M (see 
Theorem  4.1.1).  By  Lemma  1 of  Section  3.2,  e/(Nc-mp)  goes  almost 
surely  to  one.  By  i)  of  Theorem  3.4.3,  we  have  that 

N-1M  = N‘1X*F4(Ik-d'(dd,)“1d)F^X*1 


2 -1 

goes  almost  surely  to  R+(l-t-,)o  I , where  t-,  = lim  (N-k)N'  . Since 

p 1 N-~> 

k is  fixed,  ^ * 1.  Therefore,  in  this  case,  M goes  almost  surely 
to  R.  Since  the  eigenvalues  of  a matrix  are  continuous  functions  of 
that  matrix,  the  rth  smallest  eigenvalue  of  M goes  almost  surely  to 
the  rth  smallest  eigenvalue  of  R.  For  our  fixed  alternative  (see 
the  paragraph  preceding  this  theorem),  R = RQ  and  the  rth  smallest 
eigenvalue  of  Rg  is  some  positive  number.  We  can  conclude  that 

r+1  goes  almost  surely  to  positive  infinity.  Therefore  (Nc-mp)  A 
goes  almost  surely  to  positive  infinity.  If  we  apply  Theorem  2.4.1, 
our  theorem  (Theorem  4.4.1)  follows.  Q.E.D. 

For  Case  2,  t-j  ^ 1,  we  have  a similar  result: 

Theorem  4.4.2.  The  asymptotic  test  given  in  Theorem  4.3.2  (Case  2) 
is  consistent. 


Proof.  Let  us  consider 
k'^Nc-pm^-k^r  = 


l (N<t>*-k)k  2-k2r[(o/(Nc-pm))-l] 

i=p-r+l 

e/(Nc-pm) 


which  is  equation  (4.3.2).  By  Lemma  1,  e/(Nc-pm)  goes  almost  surely 
to  1.  By  i)  of  Theorem  3.4.3  we  have  that 

N-1M  = N‘1X*F4(Ik-d,(dd,)'1d)F^X*’ 

2 -1 

goes  almost  surely  to  R+(l-ti)o  I.  where  t,  = lim  (N-k)N  . Since 

1 p 1 N-*» 

the  eigenvalues  of  a matrix  are  continuous  functions  of  that  matrix. 


the  rth  smallest  eigenvalues  of  M goes  almost  surely  to  the  rth 
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smallest  eigenvalue  of  R.  For  our  fixed  alternative  (see  the 
paragraph  preceding  Theorem  4.4.1),  R - Rq  and  the  rth  smallest 
eigenvalue  of  N"^M  goes  almost  surely  to  a quantity  greater  than 

O 

( 1 — t -j ) o . We  therefore  have  that  f*  r+-j  goes  almost  surely  to 
a quantity  greater  than  1-t-j,  and  that  “ pf  goes  almost  surely 

to  a quantity  greater  than  0.  We  conclude  that 

l (N$T-k)k'r 
i=p-r+l 

goes  almost  surely  to  positive  infinity.  A^  then  goes  to  positive 
infinity.  We  now  may  apply  Theorem  2.4.1  to  complete  the  proof  of 
this  theorem.  Q.E.D. 

For  Case  3,  we  have  the  following  theorem. 

Theorem  4.4.3.  The  asymptotic  test  described  in  Theorem  4.3.3  is 
consi stent. 

Proof.  We  omit  a proof  since  the  proof  is  similar  to  the  proofs 


for  Theorem  4.4.1  and  Theorem  4.4.2. 
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